Open edgarrmondragon opened 1 year ago
@edgarrmondragon - I hand't really thought through this before you logged here. Regarding failing early on config
inputs, I'm 100% in agreement that this makes sense. A few caveats below on implementation challenges and backwards compatibility.
date-time
format fails validation of date-only values here: https://www.jsonschemavalidator.net/s/NpWURgmlanyOf
when constructing the target table.DateOrDateTypeType
type for use in applications like the canonical start_date
, possibly defined using a new AnyOfType
helper, as in Property("start_date", type=AnyOfType(DateType, DataTypeType, required=false)
.
format
constraints of config, just not for record data.If a tap is upgraded to the latest version of the SDK, the added check could cause certain config inputs to fail - but this would happen presumably at a time when the user is upgrading the tap or target on their side. Whether expecting to have to retest or not, a user could update 2022-01-01
to 2022-01-01T00:00:00
or similar.
Because our config declarations do not need to be instantiated by the target as columns, I don't worry about anyOf
being handled by targets.
start_date
config specificallyIf we're focusing just about the start_date
config input, I think the best path is to allow either format; I think many Singer getting started guides even give samples with date-only sample values - and it would not be good for these examples to fail.
2. That said, many/most targets can (and do) ingest date-only values into date-time typed properties and nodes. Some simple targets may also have difficult parsing
anyOf
when constructing the target table.
@aaronsteers Agreed. Dumping DateTimeType
as an anyOf
would be problematic for most targets.
I'm also in favor of always validating format constraints of config, just not for record data.
Me too, but my only concern with that would be packages that neither pin a specific SDK version, nor have an upper bound set to a existing release. We also don't enforce that best practice in our templates:
(might be good to revisit https://gitlab.com/meltano/sdk/-/merge_requests/208)
In those cases, a simple meltano install
could break taps at the config validation stage in the next sync.
If we're focusing just about the
start_date
config input, I think the best path is to allow either format; I think many Singer getting started guides even give samples with date-only sample values - and it would not be good for these examples to fail.
Yeah, that's what motivated my anyOf
proposal for DateTimeType
above but I had not considered that would cause issues when used in stream schemas. Do you think allowing both formats without forcing users to update to Property("start_date", type=AnyOfType(DateType, DataTypeType, required=false)
is a requirement?
See https://github.com/meltano/sdk/issues/1457 for a related issue in record data validation. It seems like we're already validating format:
This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen
label, or request that it be added.
Still relevant
This has been marked as stale because it is unassigned, and has not had recent activity. It will be closed after 21 days if no further activity occurs. If this should never go stale, please add the evergreen
label, or request that it be added.
Singer SDK Version
0.21.0
Python Version
3.10
Bug scope
Configuration (settings parsing, validation, etc.)
Operating System
Any
Description
A config schema like
doesn't result in a validation error when passed a string in an incorrect format
That means that the bad string is caught too late, when the
start_date
needs to be parsed for incremental replication.Backwards compatibility
We could just enable
format
checks by updating the validator instance inhttps://github.com/meltano/sdk/blob/774a17664a139b966de5c80ed69aab2c161e2752/singer_sdk/plugin_base.py#L237-L238
The problem with doing that, even if we don't add the format extras, is that we'd break for example any users passing
2021-01-01
as an instance ofdate-time
.One option would be to make
singer_sdk.typing.DateTimeType
be dumped as a union:Code
No response