Closed nieweiming closed 3 years ago
Thanks for taking the time to send the PR.
How is this different from using SCHEDULER_IDLE_BEFORE_CLOSE
setting? See https://github.com/rmax/scrapy-redis#usage
That feature uses a blocking redis operation to wait for the next request https://github.com/rmax/scrapy-redis/blob/fff0d8279e021600537cc8645e63263ad99887c0/src/scrapy_redis/scheduler.py#L163-L164
class RedisMixin(object):
def setup_redis(self, crawler=None):
...
self.server = connection.from_settings(crawler.settings)
# The idle signal is called when the spider has no requests left,
# that's when we will schedule new requests from redis queue
crawler.signals.connect(self.spider_idle, signal=signals.spider_idle)
def schedule_next_requests(self):
"""Schedules a request if available"""
# TODO: While there is capacity, schedule a batch of redis requests.
for req in self.next_requests():
self.crawler.engine.crawl(req, spider=self)
def spider_idle(self):
"""Schedules a request if available, otherwise waits."""
# XXX: Handle a sentinel to close the spider.
self.schedule_next_requests()
raise DontCloseSpider
SCHEDULER_IDLE_BEFORE_CLOSE will not stop the crawler, because DontCloseSpider is always thrown, So I hope that when the queue is idle for a period of time, it can end by itself. The task is completed but in the running state, which means the occupation of resources;
Oh, please update the readme too with this new setting 🚀
新增空闲最大等待时间MAX_IDLE_TIME_BEFORE_CLOSE. 在设置中使用MAX_IDLE_TIME_BEFORE_CLOSE来表示最大的等待秒数. 不设置或为0时,则会一直等待. MAX_IDLE_TIME_BEFORE_CLOSE不会影响SCHEDULER_IDLE_BEFORE_CLOSE的使用.
Added maximum idle waiting time MAX_IDLE_TIME_BEFORE_CLOSE. Use MAX_IDLE_TIME_BEFORE_CLOSE in the settings to indicate the maximum number of seconds to wait. If it is not set or 0, it will wait forever. MAX_IDLE_TIME_BEFORE_CLOSE will not affect the use of SCHEDULER_IDLE_BEFORE_CLOSE.