microbiomedata / issues

public repo for issues related to NMDC work
1 stars 0 forks source link

nmdc:wfmp-11-emfy6143.1 passes validation but fails conversion to RDF due to input pattern #892

Open turbomam opened 2 days ago

turbomam commented 2 days ago

I did make squeaky-clean all test make-rdf in berkeley-schema-fy24. I haven't doen that in a while and added some collection that I may have never run through make-rdf before.

poetry run linkml-validate \
    --schema nmdc_schema/nmdc_materialized_patterns.yaml local/mongo_as_nmdc_database_rdf_safe.yaml

passes, but

poetry run linkml-convert \
    --output local/mongo_as_nmdc_database.ttl \
    --schema nmdc_schema/nmdc_materialized_patterns.yaml local/mongo_as_nmdc_database_rdf_safe.yaml

emits

Failed validating 'pattern' in schema[6]['properties']['has_input']['items']:
    {'pattern': '^(nmdc):(bsm|procsm)-([0-9][a-z]{0,6}[0-9])-([A-Za-z0-9]{1,})$',
     'type': 'string'}

On instance['has_input'][0]:
    'nmdc:dobj-11-agsd2f41'

corresponding to this fragment:

- id: nmdc:wfmp-11-emfy6143.1
  name: Metaproteomics Analysis Activity for nmdc:wfmp-11-emfy6143.1
  started_at_time: '2024-08-14T00:07:16+00:00'
  ended_at_time: '2024-08-14T04:37:30+00:00'
  was_informed_by: nmdc:omprc-11-5svnja50
  execution_resource: EMSL
  git_url: https://github.com/microbiomedata/metaPro/releases/tag/v1.2.1
  has_input:
  - nmdc:dobj-11-agsd2f41
  - nmdc:dobj-11-2f3gzn94
  - nmdc:dobj-11-8yvaz057
  - nmdc:dobj-11-h9637w90
  - nmdc:dobj-11-hfx93f93
  - nmdc:dobj-11-sprrem27
  has_output:
  - nmdc:dobj-11-sx7cyr58
  - nmdc:dobj-11-p2c98g23
  - nmdc:dobj-11-gmv0d626
  - nmdc:dobj-11-hfjbht29
  type: nmdc:MetaproteomicsAnalysis
  version: v1.2.1
Traceback > File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-gXr5ogK9-py3.10/bin/linkml-convert", line 8, in > sys.exit(cli()) > File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-gXr5ogK9-py3.10/lib/python3.10/site-packages/click/core.py", line 1157, in __call__ > return self.main(*args, **kwargs) > File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-gXr5ogK9-py3.10/lib/python3.10/site-packages/click/core.py", line 1078, in main > rv = self.invoke(ctx) > File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-gXr5ogK9-py3.10/lib/python3.10/site-packages/click/core.py", line 1434, in invoke > return ctx.invoke(self.callback, **ctx.params) > File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-gXr5ogK9-py3.10/lib/python3.10/site-packages/click/core.py", line 783, in invoke > return __callback(*args, **kwargs) > File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-gXr5ogK9-py3.10/lib/python3.10/site-packages/linkml/utils/converter.py", line 153, in cli > validation.validate_object(obj, schema) > File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-gXr5ogK9-py3.10/lib/python3.10/site-packages/linkml/utils/validation.py", line 46, in validate_object > return jsonschema.validate( > File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-gXr5ogK9-py3.10/lib/python3.10/site-packages/jsonschema/validators.py", line 1332, in validate > raise error > jsonschema.exceptions.ValidationError: 'nmdc:dobj-11-agsd2f41' does not match '^(nmdc):(bsm|procsm)-([0-9][a-z]{0,6}[0-9])-([A-Za-z0-9]{1,})$'
turbomam commented 2 days ago

The structured_pattern in https://microbiomedata.github.io/berkeley-schema-fy24/MetaproteomicsAnalysis/#induced

seems to imply that '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$' is expected

turbomam commented 2 days ago

now that we are aggregating all workflows into the workflow_execution_set, I can't exclude MetaproteomicsAnalysis instances!

aclum commented 2 days ago

does rdf not use the structured_pattern?

turbomam commented 2 days ago

does rdf not use the structured_pattern?

Good question. For the record, nothing uses structured_pattern directly at this point in time. To benefit from a structured_pattern, one has to re-generate the schema with something like gen-linkml --materialize-patterns, which the kind of process that generates nmdc_schema/nmdc_materialized_patterns.yaml. It's the patterns that are utilized.

I still think that this problem may be due to LinkML tooling rater than the nmdc-schema, though.

aclum commented 2 days ago

If it helps debug the only Classes that has a pattern match of (bsm|procsm) is DataGeneration and subclasses. Not sure how or where it is confusing a WorkflowExecution subclass for a DataGeneration subclass.