istresearch / scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
http://scrapy-cluster.readthedocs.io/
MIT License
1.18k stars 323 forks source link

scrapy-cluster stops crawling of some websites after a while #196

Closed alotfitakami closed 6 years ago

alotfitakami commented 6 years ago

Hi, I am using scrapy-cluster to crawl plenty of websites in a real-time manner. I do not want to lose any new links on these websites, so I am using multiple spiders to crawl the websites. when crawling a website is finished I send a new crawl job to scrapy-cluster for that website over and over, that is because I want to have all new links of the website as fast as possible. the problem is that after a while scrapy-cluster only crawls the first and doesn't go through the other links, however, I am sure that there are a lot of new links to scrap on the website. Is there anyone to help me?

madisonb commented 6 years ago

As per the conversation in Gitter, I suspect this is a misunderstanding on how the project works, so we can keep the conversation there and link to the final results or actions and close this ticket when appropriate.

madisonb commented 6 years ago

Closing due to stale issue.