yatish27 / linkedin-scraper

Scrapes the public profile of the linkedin page
MIT License
555 stars 220 forks source link

Is already working with the new version of Linkedin? #104

Open fritzZz opened 7 years ago

fritzZz commented 7 years ago

When I execute the command ./linkedin-scraper https://www.linkedin.com/in/blablabla/ I got this error:

/usr/lib/ruby/gems/1.9.1/gems/mechanize-2.7.4/lib/mechanize/http/agent.rb:942:in response_read': 999 => -- https://www.linkedin.com/in/blablabla/ (Mechanize::ResponseCodeError) from /usr/lib/ruby/gems/1.9.1/gems/mechanize-2.7.4/lib/mechanize/http/agent.rb:270:inblock in fetch' from /usr/lib/ruby/1.9.1/net/http.rb:1323:in block (2 levels) in transport_request' from /usr/lib/ruby/1.9.1/net/http.rb:2672:inreading_body' from /usr/lib/ruby/1.9.1/net/http.rb:1322:in block in transport_request' from /usr/lib/ruby/1.9.1/net/http.rb:1317:incatch' from /usr/lib/ruby/1.9.1/net/http.rb:1317:in transport_request' from /usr/lib/ruby/1.9.1/net/http.rb:1294:inrequest' from /usr/lib/ruby/gems/1.9.1/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:999:in request' from /usr/lib/ruby/gems/1.9.1/gems/mechanize-2.7.4/lib/mechanize/http/agent.rb:267:infetch' from /usr/lib/ruby/gems/1.9.1/gems/mechanize-2.7.4/lib/mechanize.rb:464:in get' from /home/fritzzz/Downloads/linkedin-scraper-master/lib/linkedin-scraper/profile.rb:34:ininitialize' from ./linkedin-scraper:11:in new' from ./linkedin-scraper:11:in

'

Anyone of you has the same problem?

rubenbaden commented 7 years ago

It worked twice for me out of a few hundred times - Im assuming maybe we need new user agents?

I'm not sure but need help!

Will post if I find anything out.

Startouf commented 7 years ago

I don't understand why you have to start another issue when there are 2 discussing about this already =_=

yatish27 commented 7 years ago

Linkedin is strict. It identifies bot requests and sends a 404 repsonse

cyberfab007 commented 7 years ago

I found using curl to authentic linkedin worked well, also I have been able to pull down profile requests as well, the issue I am running in too is processing the java script so it can be readable in DOMdocument so I can use XPATH to scrape the information. right now I have a bunch of pregmatch trickery going on sorting through json output that comes down. I wrote my script php, its a class object, anyone care to help with it ? I tried using php-phantomjs , it works well unless you hit a redirect or need to use cookies. I am sure with some time and effort it will work.