sparklemotion / mechanize

Mechanize is a ruby library that makes automated web interaction easy.
https://www.rubydoc.info/gems/mechanize/
MIT License
4.38k stars 475 forks source link

Mechanize ResponseCodeError: 999 #336

Closed dustyhorizon closed 10 years ago

dustyhorizon commented 10 years ago

Hi guys,

Am trying to use Mechanize to get the html of a public facing website such as http://www.linkedin.com/in/barackobama?trk=pub-pbmap. It works within my local console but not after my app has been deployed to heroku.

Steps I used (Local) rails c a = Mechanize.new a.get("http://www.linkedin.com/in/barackobama?trk=pub-pbmap")

RESPONSE

Steps I used (Heroku) heroku run console a = Mechanize.new a.get("http://www.linkedin.com/in/barackobama?trk=pub-pbmap")

Mechanize::ResponseCodeError: 999 => for -- http://www.linkedin.com/in/barackobama?trk=pub-pbmap

With logging on

I, [2013-09-21T09:51:14.943116 #2] INFO -- : Net::HTTP::Get: /in/barackobama?trk=pub-pbmapbarackobama?trk=pub-pbm') D, [2013-09-21T09:51:14.946446 #2] DEBUG -- : request-header: accept-encoding => gzip,deflate,identity D, [2013-09-21T09:51:14.948645 #2] DEBUG -- : request-header: accept => / D, [2013-09-21T09:51:14.948835 #2] DEBUG -- : request-header: user-agent => Mechanize/2.7.2 Ruby/2.0.0p247 (http://github.com/sparklemotion/mechanize/) D, [2013-09-21T09:51:14.948958 #2] DEBUG -- : request-header: accept-charset => ISO-8859-1,utf-8;q=0.7,*;q=0.7 D, [2013-09-21T09:51:14.949063 #2] DEBUG -- : request-header: accept-language => en-us,en;q=0.5 D, [2013-09-21T09:51:14.949288 #2] DEBUG -- : request-header: host => www.linkedin.com I, [2013-09-21T09:51:15.009196 #2] INFO -- : status: Net::HTTPUnknownResponse 1.1 999 INKApi Error D, [2013-09-21T09:51:15.015572 #2] DEBUG -- : response-header: date => Sat, 21 Sep 2013 09:51:15 GMT D, [2013-09-21T09:51:15.015720 #2] DEBUG -- : response-header: nncoection => close D, [2013-09-21T09:51:15.015764 #2] DEBUG -- : response-header: server => ATS D, [2013-09-21T09:51:15.015802 #2] DEBUG -- : response-header: x-li-pop => PROD-ECH3 D, [2013-09-21T09:51:15.015846 #2] DEBUG -- : response-header: content-length => 511 D, [2013-09-21T09:51:15.016936 #2] DEBUG -- : response-header: content-type => text/html D, [2013-09-21T09:51:15.017063 #2] DEBUG -- : response-header: set-cookie => X-LI-IDC=C2 D, [2013-09-21T09:51:15.017317 #2] DEBUG -- : Read 511 bytes (511 total) Mechanize::ResponseCodeError: 999 => for -- http://www.linkedin.com/in/barackobama?trk=pub-pbmap

UPDATE: Tried using various user agent alias but to no avail.

leejarvis commented 10 years ago

I'm not sure why this is working locally for you, but I doubt LinkedIn likes the Mechanize user agent. Make sure you set the user agent yourself using Mechanize#user_agent_alias=

irb(main):001:0> Mechanize.new.get "http://www.linkedin.com/in/barackobama?trk=pub-pbmap"
Mechanize::ResponseCodeError: 999 =>  for  -- http://www.linkedin.com/in/barackobama?trk=pub-pbmap
irb(main):003:0> Mechanize.new.tap { |m| m.user_agent_alias = 'Mac Safari' }.get("http://www.linkedin.com/in/barackobama?trk=pub-pbmap")
=> #<Mechanize::Page
dustyhorizon commented 10 years ago

hmm, seems like it doesnt like my dyno's IP?

Running console attached to terminal... up, run.1446 Loading production environment (Rails 4.0.0) irb(main):001:0> Mechanize.new.tap { |m| m.user_agent_alias = 'Mac Safari' }.get("http://www.linkedin.com/in/barackobama?trk=pub-pbmap") Mechanize::ResponseCodeError: 999 =>  for  -- http://www.linkedin.com/in/barackobama?trk=pub-pbmap

dustyhorizon commented 10 years ago

Nevermind, seems like Linkedin blacklisted one of heroku's IP, used a proxy for Mechanize and it works. Thanks

ikbenale commented 10 years ago

dustyhorizon you should use the LinkedIn API, screen scraping is forbidden by LinkedIn's terms of service. Source: http://developer.linkedin.com/comment/28052#comment-28052

mmahalwy commented 10 years ago

@dustyhorizon can you explain how you used a proxy for Mechanize?

Thanks!

ranjithnalimela commented 9 years ago

Hey i got same error url = "https://www.linkedin.com/in/ranjithnalimela" a.get(url) Mechanize::ResponseCodeError: 999 => -- https://www.linkedin.com/in/ranjithnalimela from /home/deploy/domains/jobhuk/shared/bundle/ruby/1.9.1/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:931:in response_read' from /home/deploy/domains/jobhuk/shared/bundle/ruby/1.9.1/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:260:inblock in fetch' from /usr/local/rvm/rubies/ruby-1.9.3-p392/lib/ruby/1.9.1/net/http.rb:1322:in block (2 levels) in transport_request' from /usr/local/rvm/rubies/ruby-1.9.3-p392/lib/ruby/1.9.1/net/http.rb:2671:inreading_body' from /usr/local/rvm/rubies/ruby-1.9.3-p392/lib/ruby/1.9.1/net/http.rb:1321:in block in transport_request' from /usr/local/rvm/rubies/ruby-1.9.3-p392/lib/ruby/1.9.1/net/http.rb:1316:incatch' from /usr/local/rvm/rubies/ruby-1.9.3-p392/lib/ruby/1.9.1/net/http.rb:1316:in transport_request' from /usr/local/rvm/rubies/ruby-1.9.3-p392/lib/ruby/1.9.1/net/http.rb:1293:inrequest' from /home/deploy/domains/jobhuk/shared/bundle/ruby/1.9.1/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:999:in request' from /home/deploy/domains/jobhuk/shared/bundle/ruby/1.9.1/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:257:infetch' from /home/deploy/domains/jobhuk/shared/bundle/ruby/1.9.1/gems/mechanize-2.7.2/lib/mechanize.rb:432:in get' from (irb):4 from /home/deploy/domains/jobhuk/shared/bundle/ruby/1.9.1/gems/railties-3.2.8/lib/rails/commands/console.rb:47:instart' from /home/deploy/domains/jobhuk/shared/bundle/ruby/1.9.1/gems/railties-3.2.8/lib/rails/commands/console.rb:8:in start' from /home/deploy/domains/jobhuk/shared/bundle/ruby/1.9.1/gems/railties-3.2.8/lib/rails/commands.rb:41:in<top (required)>' from script/rails:6:in require' from script/rails:6:in

'1.9.1 :005 >