Open rennerocha opened 9 months ago
This change has the potential to break applications that are relying that Spidermon will understand date
and datetime
values and validate them with jsonschema.
To make it work, the user needs to manually serialize the date
and datetime
values in the items. But I am trying to figure out if there some solution that could be implemented in Spidermon side, to avoid this manipulation.
cc @VMRuiz @Gallaecio
Hey, sorry for getting back to you late on this. I'm not entirely sure if we should change anything here. If you want your field to be a string with a date format, you could scrape it that way or set up an item pipeline to automatically convert datetime objects into strings if that's easier for you.
I don't think Spidermon should make that decision for you by default. But I'm open to the idea of adding it as an opt-in feature where you can configure auto-casting methods for your fields. It could come in handy, especially when you want to validate with Jsonschema but still keep the original data types, like for binary RPC calls.
What do you think @Gallaecio @curita ?
After https://github.com/scrapinghub/spidermon/pull/358, the validation of date fields using
jsonschema
is not working as before. Spidermon was serializing date fields into strings (https://github.com/scrapinghub/spidermon/pull/358/files#diff-7937ac85a30630fe837b9c133f4459ee590680bb5dfce72775db6005f2b45f51L142), so when injected into jsonschema validators, thedate
anddate-time
checkers (https://python-jsonschema.readthedocs.io/en/stable/validate/#validating-formats) didn't work as expected if the item contains adatetime.date
or adatetime.datetime
instance.Given the code:
Validating with spidermon 1.20.0
With spidermon 1.17.0
Validating with spidermon 1.20.0