istresearch / scrapy-cluster

This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster.
http://scrapy-cluster.readthedocs.io/
MIT License
1.18k stars 323 forks source link

DUPEFILTER_CLASS Doesn't Work #177

Closed jamesliu668 closed 6 years ago

jamesliu668 commented 6 years ago

Hello,

From the scrapy project, I find that I can change the setting DUPEFILTER_CLASS to overwrite the class, however, this setting doesn't work in scrapy-cluster any more.

Here is the setting in scrapy: DUPEFILTER_CLASS

madisonb commented 6 years ago

Scrapy Cluster uses a customized scheduler, and therefore needs a customized deduplication filter due to its distributed nature. You are free to modify the RFPDupeFilter implementation to meet your needs, but keep in mind it may not give you the behavior you expect.