Closed turbomam closed 1 month ago
start with any inbound or outbound relationship including
I don't think LinkML does anything like this by default.
@aclum has a report of inter-class relationships that we worked on together, but I just whipped this up too:
PREFIX nmdc: <https://w3id.org/nmdc/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
select
?p ?ot (count(?b) as ?bcount)
where {
graph <mongodb://mongo-loadbalancer.nmdc.production.svc.spin.nersc.gov:27017> {
?b a nmdc:Biosample ;
?p ?o .
?o a ?ot .
} minus
{
?o a ?vt .
?vt rdfs:subClassOf* nmdc:AttributeValue
}
}
group by ?p ?ot
There are some patterns like this in the data, from an RDF perspective:
Biosample
collected_from
some FieldResearchSite
gold_biosample_identifiers
some Biosample
will be fixed by coercion to Napa style id
sStudy
qualifiedAssociation
some Association
for credit associationsgold_study_identifiers
some Study
will be fixed by coercion to Napa style id
sOmicsProcessing
part_of
some Study
gold_sequencing_project_idnetifiers
some OmicsProcessing
will be fixed by coercion to Napa style id
shas_input
some Biosample
has_input
some ProcessedSample
has_output
some DataObject
Having added most of the pattern constraints on slots that mention things with id
s, but not having updated any of the example data files:
poetry run linkml-run-examples \
--schema project/nmdc_schema_generated.yaml \
--input-directory src/data/valid \
--counter-example-input-directory src/data/invalid \
--output-directory examples/output > examples/output/README.md
INFO:root:Using SchemaView with im=None Traceback (most recent call last): File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-w12NqEaO-py3.9/lib/python3.9/site-packages/linkml/workspaces/example_runner.py", line 186, in process_examples_from_list validator.validate_dict(input_dict, tc, closed=True) File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-w12NqEaO-py3.9/lib/python3.9/site-packages/linkml/validators/jsonschemavalidator.py", line 97, in validate_dict raise JsonSchemaDataValidatorError(results) linkml.validators.jsonschemavalidator.JsonSchemaDataValidatorError: 'gold:Gs0110115' does not match '^nmdc:stdy-[A-Z]{4}-[0-9]{4}-[0-9]{4}$' in $.biosample_set[0].part_of[0] 'gold:Gs0110115' does not match '^nmdc:stdy-[A-Z]{4}-[0-9]{4}-[0-9]{4}$' in $.biosample_set[1].part_of[0] 'gold:Gs0110115' does not match '^nmdc:stdy-[A-Z]{4}-[0-9]{4}-[0-9]{4}$' in $.biosample_set[2].part_of[0] 'gold:Gs0110115' does not match '^nmdc:stdy-[A-Z]{4}-[0-9]{4}-[0-9]{4}$' in $.biosample_set[3].part_of[0]
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-w12NqEaO-py3.9/bin/linkml-run-examples", line 8, in
sys.exit(cli()) File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-w12NqEaO-py3.9/lib/python3.9/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-w12NqEaO-py3.9/lib/python3.9/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-w12NqEaO-py3.9/lib/python3.9/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-w12NqEaO-py3.9/lib/python3.9/site-packages/click/core.py", line 783, in invoke return __callback(args, **kwargs) File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-w12NqEaO-py3.9/lib/python3.9/site-packages/linkml/workspaces/example_runner.py", line 319, in cli runner.process_examples() File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-w12NqEaO-py3.9/lib/python3.9/site-packages/linkml/workspaces/example_runner.py", line 139, in process_examples self.process_examples_from_list(input_examples, fmt, False) File "/home/mark/.cache/pypoetry/virtualenvs/nmdc-schema-w12NqEaO-py3.9/lib/python3.9/site-packages/linkml/workspaces/example_runner.py", line 192, in process_examples_from_list raise ValueError(f"Example {input_example} failed validation:\n{e}") ValueError: Example src/data/valid/Database-biosamples-1.yaml failed validation: 'gold:Gs0110115' does not match '^nmdc:stdy-[A-Z]{4}-[0-9]{4}-[0-9]{4}$' in $.biosample_set[0].part_of[0] 'gold:Gs0110115' does not match '^nmdc:stdy-[A-Z]{4}-[0-9]{4}-[0-9]{4}$' in $.biosample_set[1].part_of[0] 'gold:Gs0110115' does not match '^nmdc:stdy-[A-Z]{4}-[0-9]{4}-[0-9]{4}$' in $.biosample_set[2].part_of[0] 'gold:Gs0110115' does not match '^nmdc:stdy-[A-Z]{4}-[0-9]{4}-[0-9]{4}$' in $.biosample_set[3].part_of[0] make: *** [project.Makefile:276: examples/output] Error 1
@turbomam I don't understand your 'probably unintentional' comment about the gold slots. Do you mean that the values in id and gold_study_identifiers can be the same? This will be resolved with re-iding.
Do you mean that the values in id and gold_study_identifiers can be the same? This will be resolved with re-iding.
Yes, that's what I meant. I'll update those annotations.
Study
's slot_usage
on structured_pattern.syntax
for id
:
where
so correct mentioned id
pattern
validation to
OmicsProcessing
's has_input
mentions to allow Biosample
s and ProcessedSamples
@turbomam do you want to work off of this ticket or #1212 , they are redundant as far as I can tell.
We need a pattern constraints on was_informed_by, has_calibration, was_generated_by, has_input, has_output
OK, I will come up with a uniform way of doing this.
@turbomam - this is planned for discussion today at metadata call -- the idea was a structured_pattern in slot_usage for ids (plus support in linkml JSONSchema generator for this construct) and range constraints. The other "fix" to make this the uniform way of doing this, is to "fix" the OWL generator to use the range constraint and not the structured_pattern.
This was resolved by https://github.com/microbiomedata/nmdc-schema/pull/1994
cc @aclum