Open smrgeoinfo opened 2 months ago
generalize dates to allow YYYY, YYYY-MM, YYYY-MM-DD, YYYY-MM-DDThh-mm etc.... Update schema. Generate issue
This is done. Will be fixed next time we push to central. The change we made to the schema (which we now validate against our unit tests) was this:
"result_time": {
"description": "Date on which the sample was collected.",
"anyOf": [
{
"format": "date"
},
{
"format": "date-time"
}
],
"type": "string"
},
Have to update the LinkML/YAML schema as well to keep in sync with the JSON. I was going to allow all the various formats (YYYY, YYYY-MM, YYYY-MM-DD, YYYY-MM-DDThh-mm), but then went out of town before I could figure out how to do it in the LinkML
I'd suggest adding some regex patterns to allow YYYY and YYYY-MM: "anyOf": [ { "format": "date", "description": "YYYY-MM-DD" }, { "format": "date-time", "description": "YYYY:MM::DDThh:mm:ss.sTZD" }, { "pattern": "^(?:[1]?[0-9]{3}|20[0-2][0-9])$", "description": "gets YYYY" }, { "pattern": "^(?:[1]?[0-9]{3}|20[0-2][0-9])-(?:0[1-9]|1[0-2])$", "description": "gets YYYY-MM" } ],
see https://github.com/isamplesorg/isamples_inabox/issues/389
Using JSON schema iSamples (github-- isamplesorg/metadata/src/schemas/iSamplesSchemaCore1.0.json
get metadata records in core schema via the API, e.g. https://central.isample.xyz/isamples_central/thing/ark%3A%2F65665%2F309fc4ed5-cbc9-4821-918a-1ff0aa92d0dd?format=core
Load the JSON in Oxygen, and run validator against JSON schema (I tried on records from all 4 authorities). Errors reported:
Description: #/curation/access_constraints: expected type: JSONArray, found: String -- fix:
"access_constraints": ""
, should be"access_constraints": []
or"access_constraints": [""]
or"access_constraints": ["constraint text 1", "constraint text 2"]
Description: #/keywords/0: expected type: JSONObject, found: String -- fix:
"keywords": ["Individual Sample"]
, should be"keywords": [{"keyword":"Individual Sample"}]
,Description: #/produced_by/@id: extraneous key [@id] is not permitted -- fix: current schema has an identifier property on the sampling event, this is where the id should go.
Description: #/produced_by/responsibility/0: expected type: JSONObject, found: String -- fix: value should be an agent object.
"responsibility": ["Carl Francis,,Sample Owner"],
should be"responsibility": [{"name":"Carl Francis","role":"Sample Owner"}],
Description: #/produced_by/result_time: [2007-02-03 12:00:00] is not a valid date. Expected [yyyy-MM-dd] -- fix: data type is JSON schema date, this requires yyyy-MM-dd format, so value should be "2007-02-03". If we want to allow timestamps, we'd need to revise JSON schema; could allow either date or dateTime format for the date string
Description: #/produced_by/sampling_site/sample_location/latitude: expected type: Number, found: Null
Description: #/produced_by/sampling_site/sample_location/longitude: expected type: Number, found: Null -- fix: 'null' is not recognized by the validator as a number. I'd suggest just leaving "latitude" and "longitude" keys out if there are no values. In fact, if there are no coordinates, an empty object might be the best solution-- it indicates an explicit null.
"sample_location": {},