Closed dustyhorizon closed 10 years ago
I'm not sure why this is working locally for you, but I doubt LinkedIn likes the Mechanize user agent. Make sure you set the user agent yourself using Mechanize#user_agent_alias=
irb(main):001:0> Mechanize.new.get "http://www.linkedin.com/in/barackobama?trk=pub-pbmap"
Mechanize::ResponseCodeError: 999 => for -- http://www.linkedin.com/in/barackobama?trk=pub-pbmap
irb(main):003:0> Mechanize.new.tap { |m| m.user_agent_alias = 'Mac Safari' }.get("http://www.linkedin.com/in/barackobama?trk=pub-pbmap")
=> #<Mechanize::Page
hmm, seems like it doesnt like my dyno's IP?
Running console
attached to terminal... up, run.1446
Loading production environment (Rails 4.0.0)
irb(main):001:0> Mechanize.new.tap { |m| m.user_agent_alias = 'Mac Safari' }.get("http://www.linkedin.com/in/barackobama?trk=pub-pbmap")
Mechanize::ResponseCodeError: 999 => for -- http://www.linkedin.com/in/barackobama?trk=pub-pbmap
Nevermind, seems like Linkedin blacklisted one of heroku's IP, used a proxy for Mechanize and it works. Thanks
dustyhorizon you should use the LinkedIn API, screen scraping is forbidden by LinkedIn's terms of service. Source: http://developer.linkedin.com/comment/28052#comment-28052
@dustyhorizon can you explain how you used a proxy for Mechanize?
Thanks!
Hey i got same error
url = "https://www.linkedin.com/in/ranjithnalimela"
a.get(url)
Mechanize::ResponseCodeError: 999 => -- https://www.linkedin.com/in/ranjithnalimela
from /home/deploy/domains/jobhuk/shared/bundle/ruby/1.9.1/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:931:in response_read' from /home/deploy/domains/jobhuk/shared/bundle/ruby/1.9.1/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:260:in
block in fetch'
from /usr/local/rvm/rubies/ruby-1.9.3-p392/lib/ruby/1.9.1/net/http.rb:1322:in block (2 levels) in transport_request' from /usr/local/rvm/rubies/ruby-1.9.3-p392/lib/ruby/1.9.1/net/http.rb:2671:in
reading_body'
from /usr/local/rvm/rubies/ruby-1.9.3-p392/lib/ruby/1.9.1/net/http.rb:1321:in block in transport_request' from /usr/local/rvm/rubies/ruby-1.9.3-p392/lib/ruby/1.9.1/net/http.rb:1316:in
catch'
from /usr/local/rvm/rubies/ruby-1.9.3-p392/lib/ruby/1.9.1/net/http.rb:1316:in transport_request' from /usr/local/rvm/rubies/ruby-1.9.3-p392/lib/ruby/1.9.1/net/http.rb:1293:in
request'
from /home/deploy/domains/jobhuk/shared/bundle/ruby/1.9.1/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:999:in request' from /home/deploy/domains/jobhuk/shared/bundle/ruby/1.9.1/gems/mechanize-2.7.2/lib/mechanize/http/agent.rb:257:in
fetch'
from /home/deploy/domains/jobhuk/shared/bundle/ruby/1.9.1/gems/mechanize-2.7.2/lib/mechanize.rb:432:in get' from (irb):4 from /home/deploy/domains/jobhuk/shared/bundle/ruby/1.9.1/gems/railties-3.2.8/lib/rails/commands/console.rb:47:in
start'
from /home/deploy/domains/jobhuk/shared/bundle/ruby/1.9.1/gems/railties-3.2.8/lib/rails/commands/console.rb:8:in start' from /home/deploy/domains/jobhuk/shared/bundle/ruby/1.9.1/gems/railties-3.2.8/lib/rails/commands.rb:41:in
<top (required)>'
from script/rails:6:in require' from script/rails:6:in
Hi guys,
Am trying to use Mechanize to get the html of a public facing website such as http://www.linkedin.com/in/barackobama?trk=pub-pbmap. It works within my local console but not after my app has been deployed to heroku.
Steps I used (Local) rails c a = Mechanize.new a.get("http://www.linkedin.com/in/barackobama?trk=pub-pbmap")
RESPONSE
Steps I used (Heroku) heroku run console a = Mechanize.new a.get("http://www.linkedin.com/in/barackobama?trk=pub-pbmap")
Mechanize::ResponseCodeError: 999 => for -- http://www.linkedin.com/in/barackobama?trk=pub-pbmap
With logging on
I, [2013-09-21T09:51:14.943116 #2] INFO -- : Net::HTTP::Get: /in/barackobama?trk=pub-pbmapbarackobama?trk=pub-pbm') D, [2013-09-21T09:51:14.946446 #2] DEBUG -- : request-header: accept-encoding => gzip,deflate,identity D, [2013-09-21T09:51:14.948645 #2] DEBUG -- : request-header: accept => / D, [2013-09-21T09:51:14.948835 #2] DEBUG -- : request-header: user-agent => Mechanize/2.7.2 Ruby/2.0.0p247 (http://github.com/sparklemotion/mechanize/) D, [2013-09-21T09:51:14.948958 #2] DEBUG -- : request-header: accept-charset => ISO-8859-1,utf-8;q=0.7,*;q=0.7 D, [2013-09-21T09:51:14.949063 #2] DEBUG -- : request-header: accept-language => en-us,en;q=0.5 D, [2013-09-21T09:51:14.949288 #2] DEBUG -- : request-header: host => www.linkedin.com I, [2013-09-21T09:51:15.009196 #2] INFO -- : status: Net::HTTPUnknownResponse 1.1 999 INKApi Error D, [2013-09-21T09:51:15.015572 #2] DEBUG -- : response-header: date => Sat, 21 Sep 2013 09:51:15 GMT D, [2013-09-21T09:51:15.015720 #2] DEBUG -- : response-header: nncoection => close D, [2013-09-21T09:51:15.015764 #2] DEBUG -- : response-header: server => ATS D, [2013-09-21T09:51:15.015802 #2] DEBUG -- : response-header: x-li-pop => PROD-ECH3 D, [2013-09-21T09:51:15.015846 #2] DEBUG -- : response-header: content-length => 511 D, [2013-09-21T09:51:15.016936 #2] DEBUG -- : response-header: content-type => text/html D, [2013-09-21T09:51:15.017063 #2] DEBUG -- : response-header: set-cookie => X-LI-IDC=C2 D, [2013-09-21T09:51:15.017317 #2] DEBUG -- : Read 511 bytes (511 total) Mechanize::ResponseCodeError: 999 => for -- http://www.linkedin.com/in/barackobama?trk=pub-pbmap
UPDATE: Tried using various user agent alias but to no avail.