scrapy-plugins / scrapy-jsonschema

Scrapy schema validation pipeline and Item builder using JSON Schema
BSD 3-Clause "New" or "Revised" License
44 stars 12 forks source link

Drop non-conforming fields instead of whole items #21

Closed JakeCowton closed 3 years ago

JakeCowton commented 4 years ago

It seems like it would make a lot more sense if only fields that don't match the schema are dropped instead of the entire item, or at least configurable to act that way.

If the dropping of a field causes the item to not have the required fields, then drop the entire item.

Any thoughts on this approach?

JakeCowton commented 4 years ago

Just for reference, I'm happy to do a PR myself, just wanted to see if this would be desired functionality.

BurnzZ commented 4 years ago

Hi @JakeCowton , thanks for the suggestion!

Yeah, I can see myself using this new feature in certain instances.

Feel free to submit a high-level overview of your proposal and we'd be happy to review. :)

JakeCowton commented 3 years ago

I ended up rewriting my own implementation for this using a pipeline job & pydantic as it became a requirement for a dependency I have. I doubt there's appetite to switch the underlying schema structure to pydantic instead of raw JSON, so I'm closing this ticket.