scrapy-plugins / scrapy-deltafetch

Scrapy spider middleware to ignore requests to pages containing items seen in previous crawls
268 stars 49 forks source link

Request fingerprint does not work with other item types except scrapy.item.Item #49

Open InzamamAnwar opened 1 year ago

InzamamAnwar commented 1 year ago

Testing with latest version of Scrapy, I found out that yielding Dataclass objects do not work with scrapy-deltafetch. If the object type is changed to scrapy.item.Item the plugin works as intended.

Has anyone else tried it?

What if using dataclass is necessary?

palvarezcordoba commented 1 year ago

Hi @InzamamAnwar Check this: https://github.com/scrapy-plugins/scrapy-deltafetch/blob/master/scrapy_deltafetch/middleware.py#L74 You could add a check for dataclasses. It can be done like this:

import dataclasses
dataclasses.is_dataclass(r)

This repo seems to be unmaintained. You could make a fork and use your own version. I think it's unlikely that if you make a MR it will be merged, but you could try.