rmax / scrapy-redis

Redis-based components for Scrapy.
http://scrapy-redis.readthedocs.io
MIT License
5.49k stars 1.59k forks source link

`scrapy_redis.scheduler.Scheduler` not compatible with `scrapy.dupefilters.BaseDupeFilter` #293

Closed HairlessVillager closed 1 day ago

HairlessVillager commented 2 weeks ago

https://github.com/rmax/scrapy-redis/blob/48a7a8921ae064fe7b4202b130f1054ede9103d6/src/scrapy_redis/scheduler.py#L136

calls from_spider in a dupefilter class.

However, the from_spider ONLY implements in scrapy_redis.dupefilter.RFPDupeFilter, while scrapy.dupefilters.BaseDupeFilter not declares. Which will raise

  File "D:\Anaconda\anaconda3\envs\scrapy\Lib\site-packages\scrapy\crawler.py", line 160, in crawl
    yield self.engine.open_spider(self.spider, start_requests)
AttributeError: type object 'RFPDupeFilter' has no attribute 'from_spider'

and

  File "D:\Anaconda\anaconda3\envs\scrapy\Lib\site-packages\scrapy_redis\scheduler.py", line 149, in flush
    self.df.clear()
AttributeError: 'Scheduler' object has no attribute 'df'

Another user also met the same question: https://github.com/rmax/scrapy-redis/issues/242#issuecomment-2154526694