Closed tedtieken closed 11 years ago
Sorry about the late response. You are right, it would be good to be able to switch the queue behavior.
Latest version have priority queue, queue and stack implementations for the scheduler. Using the setting
SCHEDULER_QUEUE_CLASS = "scrapy_redis.queue.SpiderStack"
will provide the depth-first behavior.
I have a very large crawl project, and breadth-first meant I had to wait a very long time to get my first item (they are 2 or 3 layers down from the start url).
A quick change of Queue.py line 33 from:
to:
Gives the queue depth-first-like behavior.
Perhaps the addition of a setting like, REDIS_QUEUE_PRIORITIZE_DEPTH (defaulting to FALSE) that switches between the two behaviors would be helpful for others.