scrapinghub / frontera

A scalable frontier for web crawlers
BSD 3-Clause "New" or "Revised" License
1.29k stars 216 forks source link

idle_spider signal not fired when LOCAL_MODE = False #348

Closed MottiniMauro closed 4 years ago

MottiniMauro commented 5 years ago

I'm creating a cluster with multiple spiders/sworkers/dbworkers and I want to kill my spiders after 15 mins of being idle (to reduces costs), so I tried to detect when the spider is idle the same way the examples do :

    def from_crawler(cls, crawler, *args, **kwargs):
        spider = super(GeneralSpider, cls).from_crawler(crawler, *args, **kwargs)
        spider._set_crawler(crawler)
        spider.crawler.signals.connect(spider.spider_idle, signal=signals.spider_idle)
        return spider

    def spider_idle(self):
        # do something

But the spider_idle signal is never fired. Digging a bit into the code i found that in the FrontierManagerWrapper:

if manager is None:
            manager = LocalFrontierManager if settings.get("LOCAL_MODE") is True else SpiderFrontierManager

And because SpiderFrontierManager.finished always returns False, the signal is never fired.

What would be the correct way to detect when the spider is idle when LOCAL_MODE = False?? Is there anything I'm missing?

Thanks!