Closed newthis closed 6 years ago
Eliminating start_requests
are to ensure spinning up multiple spiders do not create the same crawl jobs over and over again, it is bad practice in this project to have them. There are no start_requests
on purpose.
Please view the API docs here for more information about how to submit crawls to your cluster.
Closing as per community issue guidelines.
I know this usage. I wonder how the scrapy spider periodicaly get crawl tasks from the message queue and where the relevant code is in scrapy cluster project. Beacause according to the scrapy framework, in the crawler's (https://github.com/scrapy/scrapy/blob/108f8c4fd20a47bb94e010e9c6296f7ed9fdb2bd/scrapy/crawler.py) crawl method , every spider's start_requests is visited. So if you call "scrapy runspider RedisSpider", surely it follow that logic.
Do I need to modify the scrapy code ?
Thank you for your answer !
Is it necessary to override the spider's start_requests method, I have read the code of RedisSpider and do not find its start_requests method. I want to know how to get the initialized seed URL and read the relevant source code.