scrapy-plugins / scrapy-zyte-smartproxy

Zyte Smart Proxy Manager (formerly Crawlera) middleware for Scrapy
BSD 3-Clause "New" or "Revised" License
357 stars 88 forks source link

scrapy-crawlera not passing auth credentials on to crawlera #51

Closed justinwiley closed 6 years ago

justinwiley commented 6 years ago

I'm getting 407 errors trying to connect to Yahoo via Crawlera after setting up the middleware.

In settings.yml I added CRAWLERA_USER and CRAWLERA_PASS. I've verified this information works when using Crawlera via curl. I've also tried CRAWLERA_APIKEY alone, without the other variables defined.

I'm using Scrapy 1.5. Any ideas? Thanks!

2018-02-26 17:09:16 [root] INFO: Using crawlera at http://proxy.crawlera.com:8010 (user: myusername)
2018-02-26 17:09:16 [root] INFO: CrawleraMiddleware: disabling download delays on Scrapy side to optimize delays introduced by Crawlera. To avoid this behaviour you can use the CRAWLERA_PRESERVE_DELAY setting but keep in mind that this may slow down the crawl significantly
2018-02-26 17:09:16 [scrapy.core.engine] DEBUG: Crawled (407) <GET http://yahoo.com> (referer: None)

The code in question:

request = scrapy.Request("http://yahoo.com", callback=self.parse_search, headers={'user-agent': 'Mozilla/5.0 (Windows NT 6.3; Trident/7.0; rv:11.0) like Gecko'})
justinwiley commented 6 years ago

False alarm...I had inadvertently added the user and pass to the main crawler class as well which was interfering with the settings.py version