rmax / scrapy-redis

Redis-based components for Scrapy.
http://scrapy-redis.readthedocs.io
MIT License
5.54k stars 1.59k forks source link

[Question] Fetch request url from redis fail #285

Open KokoTa opened 1 year ago

KokoTa commented 1 year ago

Description

If i insert start url to redis before run scrapy, is successful.

But if i run scrapy first and insert url, listen url will get fail info:

2023-08-13 17:11:59 [scrapy.utils.signal] ERROR: Error caught on signal handler: <bound method RedisMixin.spider_idle of <TestHtmlSpider 'test_html' at 0x2b05c4162d0>>
Traceback (most recent call last):
  File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy\utils\signal.py", line 43, in send_catch_log
    response = robustApply(
               ^^^^^^^^^^^^
  File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\pydispatch\robustapply.py", line 55, in robustApply
    return receiver(*arguments, **named)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy_redis\spiders.py", line 208, in spider_idle
    self.schedule_next_requests()
  File "C:\Users\KokoTa\AppData\Local\Programs\Python\Python311\Lib\site-packages\scrapy_redis\spiders.py", line 197, in schedule_next_requests
    self.crawler.engine.crawl(req, spider=self)
TypeError: ExecutionEngine.crawl() got an unexpected keyword argument 'spider'

I can't get url dynamically and scrapy will crush.

Shleif91 commented 1 year ago

Same error... Found a solution?

gc1423 commented 1 year ago

Passing a spider argument to the crawl() methods of scrapy.core.engine.ExecutionEngine is no longer supported in scrapy v2.10.0. release notes

Try scrapy 2.9.0.

GeorgeA92 commented 1 year ago

It looks like pull request https://github.com/rmax/scrapy-redis/pull/286 that fix this already exist from Aug. This can be easily applied for app with current scrapy-redis version by.. overriding schedule_next_request method.


class SomeSpider(RedisSpider):
    ## vvv _add this to spider code
    def schedule_next_requests(self):
        """Schedules a request if available"""
        # TODO: While there is capacity, schedule a batch of redis requests.
        for req in self.next_requests():
            self.crawler.engine.crawl(req, spider=self)
            # see https://github.com/scrapy/scrapy/issues/5994
            if scrapy_version >= (2, 6):
                self.crawler.engine.crawl(req)
            else:
                self.crawler.engine.crawl(req, spider=self)
xuexingdong commented 10 months ago

hope the fixed version quickly release

jordinl commented 6 months ago

@rmax would it be possible to release a fix for this? I'm also encountering this issue

migrant commented 5 months ago

The same problem...

georgeJzzz commented 5 months ago

@rmax would it be possible to release a fix for this? I'm also encountering this issue。 Thanks

rmax commented 4 months ago

Thank you for your patience. V0.8.0 has been released 🎉