istresearch / scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
http://scrapy-cluster.readthedocs.io/
MIT License
1.18k stars 323 forks source link

Getting twisted.web._newclient.ResponseNeverReceived exception #189

Closed spidysenses closed 6 years ago

spidysenses commented 6 years ago

I am getting this error for nearly 9% of requests. I tried decreasing concurrent requests to 50 but still getting error. Scrapy==1.4.0 Twisted==17.5.0 pyOpenSSL==18.0.0

madisonb commented 6 years ago

Can you successfully scrape the page in question with Scrapy? This project only modifies the scheduler, and doesnt mess with Scrapy core. There could be a number of factors here • Web page is slow or timing out waiting a response from the server • Your network bandwidth is saturated (inbound and outbound) • Your spiders themselves are having trouble talking to the internet

All of these symptoms are out of this project's control, so unless you can identify it as a Scrapy Cluster problem and give me steps to reproduce in the scrapy cluster virtualbox (or with the default settings for this project), I am not sure I can help any further.

madisonb commented 6 years ago

Closing due to inactivity.