Open Rockyzsu opened 5 years ago
Hello, I apologize for the late reply, wasn't watching this repo, it was just an example.
Hopefully you found a solution by now, but just in case, what we ended up doing was using a buffer in redis, i.e. using redis as shared application state where we store a queue of URLs to be crawled.
It's quite simple, you need to implement something like
Then, in the spider, you can fetch from redis and send to RabbitMQ when processing items if you wish.
Although it does what we need it did took some developer time to implement, it would have been easier if scrapy had a simpler way to integrate with message queues, but there are other advantages to our approach with redis.
I saw you issue this problem here https://github.com/scrapy/scrapy/issues/3477
have you fix this ? i met the same issue when try to use rabbitmq + scrapy.