microbiomedata / nmdc-schema

National Microbiome Data Collaborative (NMDC) unified data model
https://microbiomedata.github.io/nmdc-schema/
Creative Commons Zero v1.0 Universal
27 stars 8 forks source link

berkeley-schema-fy24 must migrate string "type" to object types consisting of the instances' class' `class_uri` #1640

Open turbomam opened 8 months ago

turbomam commented 8 months ago

The berkeley-schema-fy24 is much stricter about the type slot, compared to the current nmdc-schema

Currently the type of a WorkflowExecutionActivity is

An optional string that specifies the type object. This is used to allow for searches for different kinds of objects.

The berkeley-schema-fy24 requires that all data instances of nmdc-schema classes must reiterate the class' class_uri as a CURIe.

Theoretically there should be a good example of this pattern (for some WorkflowExecution) in the example data files directory already but I haven't found one yet.

cc @Michal-Babins @mslarae13

eecavanna commented 8 months ago

Thanks for bringing this to my attention, @turbomam.

Was there a specific PR for this change (the change to the type slot)? I think knowing its URL will help us (a) name the Migrator module and (b) get more info if we need it.

Is creating a CURIe something that involves obtaining something from the outside world (e.g. "minting" or "registering" something) or is it just a matter of combining various pieces of information that are already available in the same Mongo document? If it's the former (i.e. involves accessing the outside world), I may add something to the "adapter" class I'm currently working on, to support it.


Notes:

turbomam commented 8 months ago

Good questions.

This PR added the type slot to all classes:

from linkml_runtime import SchemaView
schema_yaml_file = "../src/schema/nmdc.yaml"
schema_view = SchemaView(schema_yaml_file)
study_class_obj = schema_view.get_class("CreditAssociation")
print(study_class_obj.class_uri)

prov:Association

_That one is unusual in the sense that the class_uri isn't equivalent to the default prefix (nmdc) followed by a colon and the class name. It's important to check_

I can help with this as much or little as you want.

eecavanna commented 3 months ago
Notes from 5/21/2024 migration squad meeting

See: https://github.com/microbiomedata/berkeley-schema-fy24/tree/main/tests

eecavanna commented 3 months ago

Check whether https://github.com/microbiomedata/nmdc-schema/issues/1607 encapsulates this issue. See migrator nmdc_schema/migrators/migrator_from_X_to_PR10.py.

aclum commented 2 months ago

@turbomam can this be closed? I believe in berkeley all classes now have type required with a range of a curie.