stewartmckee / cobweb

Web crawler with very flexible crawling options. Can either use standalone or can be used with resque to perform clustered crawls.
MIT License
227 stars 45 forks source link

Proxy authentication while parsing url with Net::HTTP #49

Open suvarnarajkumar opened 8 years ago

suvarnarajkumar commented 8 years ago

I have added code in lib/cobweb.rb for authenticating proxies. Along with this i have added a feature to rotate proxies for each request, i.e "proxy_shift" method inside the 'Cobweb' class is used for easy shifting proxies.

Example for proxy shifting: crawler = CobwebCrawler.new({:cache => xxx, :proxy_addr => 'xxxx', :proxy_port=> 'xxxx', :proxy_uname => 'xxxx', :proxy_pwd => 'xxxxx', :internal_urls => ["xxxxxxxx"]})

action = crawler.crawl("_url_") do |content|

 # Below line changes proxies eachtime request for getting new url.
 crawler.proxy_shift({:proxy_addr => 'xxxx', :proxy_port=> 'xxxx', :proxy_uname => 'xxxx', :proxy_pwd => 'xxxx')

  # some-statements....

end

stewartmckee commented 8 years ago

Looks great, could you submit with a couple rspec specs please, and I'll get it merged in.

Thanks.

suvarnarajkumar commented 8 years ago

Thanks for considering my code. I will do it as soon as possible and push it.