Closed justinlittman closed 2 years ago
All EDTF dates should be remediated by the end of the week. I need to request updated reports on W3CDTF and ISO8601, but those will probably take longer. H2 data should be coming in clean. MARC is more complicated because the error is in the MARC record, which the person working in Argo may not be able to edit. Currently it looks like all the MARC-derived invalid dates are from records provided by the same vendor, so this may not be a common occurrence.
The re-upload would be treated the same as any other, and require valid dates to pass. (Hopefully remediation will minimize this issue.)
Since dates are going to be remediated, I'm moving this to cocina models for validation.
Note pattern YYYYMM-- removed from iso8601.
Additional ISO 8601 date patterns:
YYYYMMDDThhmmss.s+
YYYYMMDDhhmmss.s+
@arcadiafalcone do we want to validate the semantics (e.g. 2022-02-30
) of the dates too? Should we permit BC dates? Are there any parts of ISO8601 that we want to disallow?
@jcoyne Yes, that would be great.
@arcadiafalcone I've run these values through a couple different EDTF validators and they show as invalid for EDTF:
1997-07-16T19:20
1997-07-16T19:20:30.45
1997-07-16T19:20+01:00
1997-07-16T19:20:30.45+01:00
What are your thoughts on how we should proceed?
According to the LC EDTF specification (https://www.loc.gov/standards/datetime/) those all appear acceptable. I'm curious why they're not passing, but they should be considered valid.
@arcadiafalcone The following W3CDTF values also appear to be invalid:
1997-07-16T19:20
1997-07-16T19:20:30
1997-07-16T19:20:30.45
Looking over https://www.w3.org/TR/NOTE-datetime, these look like they should be marked invalid.
@arcadiafalcone 💬
According to the LC EDTF specification (loc.gov/standards/datetime) those all appear acceptable. I'm curious why they're not passing, but they should be considered valid.
I'll have a look.
At a glance, I'm not sure I see these in the LC spec:
1997-07-16T19:20
-- I didn't see any examples that include only the hour and minute segments for time1997-07-16T19:20:30.45
-- I didn't see any fractional second examples1997-07-16T19:20+01:00
-- I didn't see any examples that use the timezone designator without the seconds segment1997-07-16T19:20:30.45+01:00
-- same as above re: fractional seconds@mjgiarlo Now I'm thinking it's better to start with the built-in validation and see if we have any data that matches these questionable patterns. Remediation may be the preferable route if we have inconsistencies.
@arcadiafalcone 💬
@mjgiarlo Now I'm thinking it's better to start with the built-in validation and see if we have any data that matches these questionable patterns. Remediation may be the preferable route if we have inconsistencies.
ok! I'll make sure the DSA reports I run use the same validation so you have solid numbers on this. Thank you. :)
@arcadiafalcone Now that this is in the cocina-models gem, when should we hook it up/turn it on? e.g., did you want to take a look at the three bad date reports first? (two of which are still being run now...)
@mjgiarlo I'd like to review the reports first. I'll give you the all-clear when it's ok to turn on.
From sul-dlss/argo#3375, validates dates (per @arcadiafalcone )