scrapy / itemloaders

Library to populate items using XPath and CSS with a convenient API
BSD 3-Clause "New" or "Revised" License
44 stars 16 forks source link

Import of old scrapylib processor functions? #41

Open nyov opened 3 years ago

nyov commented 3 years ago

Old scrapylib had a few ItemLoader processors that were dropped with the codebase.

I preserved them here when scrapylib repo disappeared. Mostly date/time parser handling and some cleaners. I wonder if they'd be useful to add here - or not. (Are they duplicating features that are elsewhere and I may have overlooked?)

I could see them become a namespace such as itemloaders.processors.extra - which wouldn't be auto-imported with itemloaders.processors and could have external dependencies that don't automatically become required - such as the dateutil parser lib used here? Or perhaps name it itemloaders.processorlib - as in stdlib - for a place to have a few generically useful processor functions? But perhaps it's also okay if they just disappear. I'm not sure, really.

Gallaecio commented 3 years ago

I would not mind having them as built-in processors.

And since we added jmespath as a dependency, I’m not sure if adding dateutil would be an issue. Maybe it could be added optionally.

Not a strong opinion either way, though.

ejulio commented 3 years ago

No strong opinion as well. I think it wold be a nice addition, though we need to be careful on what will be our threshold of accepted utility dependencies.

Optional install are fine, but they must be clear in the docs, specially in the functions that rely on the optional install