rmax / scrapy-redis

Redis-based components for Scrapy.
http://scrapy-redis.readthedocs.io
MIT License
5.52k stars 1.59k forks source link

Queue Implementation forces Breadth-First search #1

Closed tedtieken closed 11 years ago

tedtieken commented 12 years ago

I have a very large crawl project, and breadth-first meant I had to wait a very long time to get my first item (they are 2 or 3 layers down from the start url).

A quick change of Queue.py line 33 from:

pipe.zrange(self.key, 0, 0).zremrangebyrank(self.key, 0, 0)

to:

pipe.zrange(self.key, -1, -1).zremrangebyrank(self.key, -1, -1)

Gives the queue depth-first-like behavior.

Perhaps the addition of a setting like, REDIS_QUEUE_PRIORITIZE_DEPTH (defaulting to FALSE) that switches between the two behaviors would be helpful for others.

rmax commented 11 years ago

Sorry about the late response. You are right, it would be good to be able to switch the queue behavior.

rmax commented 11 years ago

Latest version have priority queue, queue and stack implementations for the scheduler. Using the setting

SCHEDULER_QUEUE_CLASS = "scrapy_redis.queue.SpiderStack"

will provide the depth-first behavior.