Open turbomam opened 2 months ago
The structured_pattern
in https://microbiomedata.github.io/berkeley-schema-fy24/MetaproteomicsAnalysis/#induced
seems to imply that '{id_nmdc_prefix}:(dobj)-{id_shoulder}-{id_blade}$'
is expected
now that we are aggregating all workflows into the workflow_execution_set
, I can't exclude MetaproteomicsAnalysis instances!
does rdf not use the structured_pattern?
does rdf not use the structured_pattern?
Good question. For the record, nothing uses structured_pattern
directly at this point in time. To benefit from a structured_pattern
, one has to re-generate the schema with something like gen-linkml --materialize-patterns
, which the kind of process that generates nmdc_schema/nmdc_materialized_patterns.yaml
. It's the pattern
s that are utilized.
I still think that this problem may be due to LinkML tooling rater than the nmdc-schema, though.
If it helps debug the only Classes that has a pattern match of (bsm|procsm) is DataGeneration and subclasses. Not sure how or where it is confusing a WorkflowExecution subclass for a DataGeneration subclass.
Thanks @aclum ! @pkalita-lbl and I just worked through this issue. It turns out that the reported root cause error comes from a JsonSchema heuristic that tried to guess the most relevant error. In this case it is just wrong.
It appears that the real error is that the has_peptide_quantifications
portion of nmdc:wfmp-11-emfy6143.1 is being converted from a list to a dict before the converter's validatior is run.
@cmungall has encouraged me to just run the converter in validation-free mode since I'm doing the conversion in a workflow, where the immediately preceding step is stand-alone validation.
@pkalita-lbl said that I could possibly create a minimal example that illustrates this case outside of the nmdc-schema. Then it might be easier for him to come up with a solution. I doubt that I will do that before the berkeley-schema-fy24 roll-out
There are some similarities to
Does this only happen when the list size is large? We have other instances where the structure is complex, like credit roles and don't run into this issue.
Does this only happen when the list size is large
I don't think so. @pkalita-lbl noticed that MetaproteomicsAnalysis.has_peptide_quantifications
is not inlined_as_list
even though it is multivalued
and it's range is a class (PeptideQuantification) that doesn't have an identifier slot.
That's an illegal combination. I have a test for other cases like that but haven't create a PR yet. The only other case right now is Biosample.heavy_metals_meth
is there a test we can run w/linkml as part of the build process to catch this illegal combo? I assume the test you describe above is an ad hoc check.
@cmungall , @pkalita-lbl and I discussed this very briefly today. It could become a check in the linter, or we could just work towards more useful error messages. That might be tough in this case, because there are a lot of error messages and a heuristic is being used to guess the best one.
I just added a Python test and will discuss it in the metadata/schema meeting tomorrow.
An interim linter test is my preference since more useful error messages is a longer term effort, or at least seems that way from misleading error message tickets that I've filed.
I did
make squeaky-clean all test make-rdf
in berkeley-schema-fy24. I haven't doen that in a while and added some collection that I may have never run throughmake-rdf
before.passes, but
emits
corresponding to this fragment:
Traceback
> File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-gXr5ogK9-py3.10/bin/linkml-convert", line 8, in