yatish27 / linkedin-scraper

Scrapes the public profile of the linkedin page
MIT License
551 stars 221 forks source link

Getting 999 when hitting public profile pages in linkedIn. #105

Open Karan-Daiict opened 6 years ago

Karan-Daiict commented 6 years ago

When I hit public profiles I get 999. I guess after 4 to 5 continous hits, linkedIn blocks the ip address. So changed the ip, but still 999 exists. Is it possible that they are able to get local machine ip address? How to solve this error without login?

I'm stuck with this. Please help.

adolfopeccin commented 6 years ago

Same here

rochenka commented 5 years ago

@Karan-Daiict @adolfopeccin same for me until I found an API that avoids the 999, i make GET requests to that API which calls Linkedin and returns the data, the API is from proxycrawl.

Karan-Daiict commented 5 years ago

Hi rochenka, You won't be able to access all the data from LinkedIn APIs. I solved it using headless browsers. Works like a charm.

Thanks, Karan Ladla

On Wed 10 Oct, 2018, 11:46 AM rochenka, notifications@github.com wrote:

@Karan-Daiict https://github.com/Karan-Daiict @adolfopeccin https://github.com/adolfopeccin same for me until I found an API that avoids the 999, i make GET requests to that API which calls Linkedin and returns the data, the API is from proxycrawl.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yatish27/linkedin-scraper/issues/105#issuecomment-428450968, or mute the thread https://github.com/notifications/unsubscribe-auth/ANTVYc92DRhKjDC7MttcGZh2rga8MIq9ks5ujZDEgaJpZM4T06HQ .

rochenka commented 5 years ago

@Karan-Daiict I do not use Linkedin API, i use ProxyCrawl API which calls linkedin internally. https://proxycrawl.com/scraping-api-avoid-captchas-blocks

Karan-Daiict commented 5 years ago

Ohh okay, that is good. Scraping with proxy built in. But you'll have to pay for it.

On Wed 10 Oct, 2018, 11:59 AM rochenka, notifications@github.com wrote:

@Karan-Daiict https://github.com/Karan-Daiict I do not use Linkedin API, i use ProxyCrawl API which calls linkedin internally. https://proxycrawl.com/scraping-api-avoid-captchas-blocks

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yatish27/linkedin-scraper/issues/105#issuecomment-428453546, or mute the thread https://github.com/notifications/unsubscribe-auth/ANTVYbsazsjlI5ficQdrR72kNov-oLFpks5ujZPWgaJpZM4T06HQ .

rochenka commented 5 years ago

@Karan-Daiict yes I do not want to manage headless browser infrastructure or whatever tricks comes along the way from Linkedin, especially that i am building a project for recruiting so I need data constantly, that is why I pay for it, the other option is to find proxies that work for Linkedin which is another difficult task to do and also I have to pay for it, so I went for an easy good solution. Do you use proxies or how do you get the data?

Karan-Daiict commented 5 years ago

Yeah. Basically I created some proxies and ran them in round robin fashion. Does the API give complete data and is the signature easy to use for proxy crawl? Also I think LinkedIn will block based on creds. Do check that.

On Wed 10 Oct, 2018, 12:29 PM rochenka, notifications@github.com wrote:

@Karan-Daiict https://github.com/Karan-Daiict yes I do not want to manage headless browser infrastructure or whatever tricks comes along the way from Linkedin, especially that i am building a project for recruiting so I need data constantly, that is why I pay for it, the other option is to find proxies that work for Linkedin which is another difficult task to do and also I have to pay for it, so I went for an easy good solution. Do you use proxies or how do you get the data?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yatish27/linkedin-scraper/issues/105#issuecomment-428460092, or mute the thread https://github.com/notifications/unsubscribe-auth/ANTVYXqCr2oO49uXNZQ63bviCU4zbG5Xks5ujZrrgaJpZM4T06HQ .

rochenka commented 5 years ago

I do not really know what they do internally, i just receive the html data that is public, i do not think they support private data crawling from linkedin though.

Karan-Daiict commented 5 years ago

Yeah, nobody supports private crawling. For public data and static data, you can use http client too. But fine, proxy crawl is good.👍

On Wed 10 Oct, 2018, 12:45 PM rochenka, notifications@github.com wrote:

I do not really know what they do internally, i just receive the html data that is public, i do not think they support private data crawling from linkedin though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yatish27/linkedin-scraper/issues/105#issuecomment-428463826, or mute the thread https://github.com/notifications/unsubscribe-auth/ANTVYT-KX1h17OLFvdXF9QLpmzY1f_zhks5ujZ6cgaJpZM4T06HQ .

adolfopeccin commented 5 years ago

@rochenka how did you implemented the scraping for linkedin data using proxycrawl?

rochenka commented 5 years ago

@adolfopeccin Its basically simple, proxycrawl does the crawling with their API, you need to have a token. I think you can get one token for 1000 free requests. So, proxycrawl gives me the raw html data response and I then use linkedin-scraper to scrape the html content.

abhishmitra commented 4 years ago

@rochenka

Hi, I spoke to proxycrawl but they have certain limits and stuff. Did they impose those recently? How are they working for you?

Bilal815 commented 12 months ago

@rochenka

Hi, I spoke to proxycrawl but they have certain limits and stuff. Did they impose those recently? How are they working for you?

I recently tried Crawlbase which was ProxyCrawl previously. They say that with JS-enabled for dynamic content which is a need for LinkedIn, you need to pay them a little. I tried my luck and it got interesting here https://crawlbase.com/docs/crawling-api/scrapers/#linkedin.