KokoTa commented 1 year ago

Description

If i insert start url to redis before run scrapy, is successful.

But if i run scrapy first and insert url, listen url will get fail info:

2023-08-13 17:11:59 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method RedisMixin.spider_idle of <TestHtmlSpider 'test_html' at 0x2b05c4162d0>>
Traceback (most recent call last):
  File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\utils\signal.py", line 43, in send_catch_log
    response = robustApply(
               ^^^^^^^^^^^^
  File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply
    return receiver(*arguments, **named)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy_redis\spiders.py", line 208, in spider_idle
    self.schedule_next_requests()
  File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy_redis\spiders.py", line 197, in schedule_next_requests
    self.crawler.engine.crawl(req, spider=self)
TypeError: ExecutionEngine.crawl() got an unexpected keyword argument 'spider'

I can't get url dynamically and scrapy will crush.

Shleif91 commented 1 year ago

Same error... Found a solution?

gc1423 commented 1 year ago

Passing a spider argument to the crawl() methods of scrapy.core.engine.ExecutionEngine is no longer supported in scrapy v2.10.0. release notes

Try scrapy 2.9.0.

GeorgeA92 commented 1 year ago

It looks like pull request https://github.com/rmax/scrapy-redis/pull/286 that fix this already exist from Aug. This can be easily applied for app with current scrapy-redis version by.. overriding schedule_next_request method.


class SomeSpider(RedisSpider):
    ## vvv _add this to spider code
    def schedule_next_requests(self):
        """Schedules a request if available"""
        # TODO: While there is capacity, schedule a batch of redis requests.
        for req in self.next_requests():
            self.crawler.engine.crawl(req, spider=self)
            # see https://github.com/scrapy/scrapy/issues/5994
            if scrapy_version >= (2, 6):
                self.crawler.engine.crawl(req)
            else:
                self.crawler.engine.crawl(req, spider=self)