Open aclum opened 3 months ago
This is a bit of brain dump of what's going on here.
material_processing_set
slot has a range of MaterialProcessing
, and MaterialProcessing
is an abstract class with a number of subclasses (9 of them by my count) including ones called Pooling
and Extraction
."material_processing_set": {
"description": "This property links a database object to the set of material processing within it.",
"items": {
"anyOf": [
{
"$ref": "#/$defs/Pooling"
},
{
"$ref": "#/$defs/Extraction"
},
{
"$ref": "#/$defs/LibraryPreparation"
},
{
"$ref": "#/$defs/SubSamplingProcess"
},
{
"$ref": "#/$defs/MixingProcess"
},
{
"$ref": "#/$defs/FiltrationProcess"
},
{
"$ref": "#/$defs/ChromatographicSeparationProcess"
},
{
"$ref": "#/$defs/DissolvingProcess"
},
{
"$ref": "#/$defs/ChemicalConversionProcess"
}
]
},
"type": "array"
},
anyOf
and verify that the instance is not valid under each of them. That is indeed what happens. We end up with a whole pile of errors indicating why the instance isn't valid under each anyOf
subschema. For example, it rules out the third subschema ("$ref": "#/$defs/LibraryPreparation"
) because of the id
slot (String 'nmdc:extrp-99-abcdef' does not match regex pattern '^(nmdc):libprp-([0-9][a-z]{0,6}[0-9])-([A-Za-z0-9]{1,})$'.
). Here is the whole pile of reasons why the instance isn't valid:
Message: String 'nmdc:extrp-99-abcdef' does not match regex pattern '^(nmdc):chcpr-([0-9][a-z]{0,6}[0-9])-([A-Za-z0-9]{1,})$'.
Schema path: https://w3id.org/nmdc/nmdc#/$defs/ChemicalConversionProcess/properties/id/pattern
Message: String 'nmdc:extrp-99-abcdef' does not match regex pattern '^(nmdc):dispro-([0-9][a-z]{0,6}[0-9])-([A-Za-z0-9]{1,})$'. Schema path: https://w3id.org/nmdc/nmdc#/$defs/DissolvingProcess/properties/id/pattern
Message: String 'nmdc:extrp-99-abcdef' does not match regex pattern '^(nmdc):cspro-([0-9][a-z]{0,6}[0-9])-([A-Za-z0-9]{1,})$'. Schema path: https://w3id.org/nmdc/nmdc#/$defs/ChromatographicSeparationProcess/properties/id/pattern
Message: String 'nmdc:extrp-99-abcdef' does not match regex pattern '^(nmdc):filtpr-([0-9][a-z]{0,6}[0-9])-([A-Za-z0-9]{1,})$'. Schema path: https://w3id.org/nmdc/nmdc#/$defs/FiltrationProcess/properties/id/pattern
Message: String 'nmdc:extrp-99-abcdef' does not match regex pattern '^(nmdc):mixpro-([0-9][a-z]{0,6}[0-9])-([A-Za-z0-9]{1,})$'. Schema path: https://w3id.org/nmdc/nmdc#/$defs/MixingProcess/properties/id/pattern
Message: String 'nmdc:extrp-99-abcdef' does not match regex pattern '^(nmdc):subspr-([0-9][a-z]{0,6}[0-9])-([A-Za-z0-9]{1,})$'. Schema path: https://w3id.org/nmdc/nmdc#/$defs/SubSamplingProcess/properties/id/pattern
Message: String 'nmdc:extrp-99-abcdef' does not match regex pattern '^(nmdc):libprp-([0-9][a-z]{0,6}[0-9])-([A-Za-z0-9]{1,})$'. Schema path: https://w3id.org/nmdc/nmdc#/$defs/LibraryPreparation/properties/id/pattern
Message: String 'nmdc:extrp-99-abcdef' does not match regex pattern '^(nmdc):poolp-([0-9][a-z]{0,6}[0-9])-([A-Za-z0-9]{1,})$'. Schema path: https://w3id.org/nmdc/nmdc#/$defs/Pooling/properties/id/pattern
Message: Value "nmdc:Extraction" is not defined in enum. Schema path: https://w3id.org/nmdc/nmdc#/$defs/ChemicalConversionProcess/properties/type/enum
Message: Value "nmdc:Extraction" is not defined in enum. Schema path: https://w3id.org/nmdc/nmdc#/$defs/DissolvingProcess/properties/type/enum
Message: Value "nmdc:Extraction" is not defined in enum. Schema path: https://w3id.org/nmdc/nmdc#/$defs/ChromatographicSeparationProcess/properties/type/enum
Message: Value "nmdc:Extraction" is not defined in enum. Schema path: https://w3id.org/nmdc/nmdc#/$defs/FiltrationProcess/properties/type/enum
Message: Value "nmdc:Extraction" is not defined in enum. Schema path: https://w3id.org/nmdc/nmdc#/$defs/MixingProcess/properties/type/enum
Message: Value "nmdc:Extraction" is not defined in enum. Schema path: https://w3id.org/nmdc/nmdc#/$defs/SubSamplingProcess/properties/type/enum
Message: Value "nmdc:Extraction" is not defined in enum. Schema path: https://w3id.org/nmdc/nmdc#/$defs/LibraryPreparation/properties/type/enum
Message: Value "nmdc:Extraction" is not defined in enum. Schema path: https://w3id.org/nmdc/nmdc#/$defs/Pooling/properties/type/enum
Message: Array item count 1 is less than minimum count of 2. Schema path: https://w3id.org/nmdc/nmdc#/$defs/Pooling/properties/has_input/minItems
Message: Property 'extraction_method' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/ChemicalConversionProcess/additionalProperties
Message: Property 'extraction_method' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/DissolvingProcess/additionalProperties
Message: Property 'extraction_method' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/ChromatographicSeparationProcess/additionalProperties
Message: Property 'extraction_method' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/FiltrationProcess/additionalProperties
Message: Property 'extraction_method' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/MixingProcess/additionalProperties
Message: Property 'extraction_method' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/SubSamplingProcess/additionalProperties
Message: Property 'extraction_method' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/LibraryPreparation/additionalProperties
Message: Property 'extraction_method' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/Extraction/additionalProperties
Message: Property 'extraction_method' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/Pooling/additionalProperties
Message: Property 'extraction_target' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/ChemicalConversionProcess/additionalProperties
Message: Property 'extraction_target' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/DissolvingProcess/additionalProperties
Message: Property 'extraction_target' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/ChromatographicSeparationProcess/additionalProperties
Message: Property 'extraction_target' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/FiltrationProcess/additionalProperties
Message: Property 'extraction_target' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/MixingProcess/additionalProperties
Message: Property 'extraction_target' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/SubSamplingProcess/additionalProperties
Message: Property 'extraction_target' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/LibraryPreparation/additionalProperties
Message: Property 'extraction_target' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/Extraction/additionalProperties
Message: Property 'extraction_target' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/Pooling/additionalProperties
4. So what's a poor JSON Schema validator to do? Show all those messages to the user and let them sort it out? That's a bit cruel to the user. So we [call](https://github.com/linkml/linkml/blob/901cbf845b725388a53bbef0f465e7ae0bbd0f52/linkml/validator/plugins/jsonschema_validation_plugin.py#L50) the [utility function](https://python-jsonschema.readthedocs.io/en/stable/api/jsonschema/exceptions/#jsonschema.exceptions.best_match) provided by the JSON Schema implementation to isolate _what it considers_ to be the most relevant error, based on its heuristics. And this case, it deems that this one is the most specific:
Message: Array item count 1 is less than minimum count of 2. Schema path: https://w3id.org/nmdc/nmdc#/$defs/Pooling/properties/has_input/minItems
5. You can see the error that @aclum was looking for in the pile, but unfortunately it wasn't deemed to be the most specific one:
Message: Property 'extraction_method' has not been defined and the schema does not allow additional properties. Schema path: https://w3id.org/nmdc/nmdc#/$defs/Extraction/additionalProperties
So on one hand we have our current approach of "attempt to sift out the best error message and present that to the user". On the other hand you could imagine an option that's like "show me the full pile of errors and I'll sort it out" -- could be useful for debugging. I don't know if there's any clever middle ground between those two. I'll have to think about it more, but it's hard to imagine how we could have surfaced the _one_ error message that @aclum wanted to see in this case.
Is there a way tell linkml-validate to use the value for slot type within an individual record to pick the most relevant error if the Class is Database?
One of the thing I was thinking about when I wrote "clever middle ground" in my last message was whether we could use values from a slot with designates_type: true
to narrow down the list of relevant error messages. I think that's something like what you're suggesting. But I haven't really dug into the code enough to see how feasible that is.
Describe the bug The error message describes an issue with the incorrect slot.
To reproduce Steps to reproduce the behavior: poetry run linkml-validate -s ../../../project/nmdc_materialized_patterns.yaml Database-Extraction-extraction_method-slot-retired.yaml
where the materialized pattern version of the schema comes from https://github.com/microbiomedata/berkeley-schema-fy24/tree/2046-Database-slot-updates and Database-Extraction-extraction_method-slot-retired.yaml is
The test fails but for the wrong reason, it complains about the length of has_input Expected behavior The error should say 'extraction_method' doesn't exist or a generic error instead of saying a valid slot is invalid.
Screenshots If applicable, add screenshots to help explain your problem.
About your computer (if applicable, please complete the following information):
Additional context cc @turbomam