sfb1451 / crc-schema-draft

https://sfb1451.github.io/crc-schema-draft/
0 stars 0 forks source link

Replace types used in place of identifiers #3

Open mslw opened 1 year ago

mslw commented 1 year ago

Current state

As of now, the Dataset class uses the following IRIs to define its slots (slot name formatting came from linkml and seems related to how the slots were defined - through attributes or slot usage):

Code snippet ``` python from linkml.utils.schemaloader import SchemaLoader schema = SchemaLoader("src/sfb1451_schema.yaml").resolve() for s_name in schema.classes['Dataset'].slots: slot = schema.slots[s_name] print(f"| {slot.name} | {slot.slot_uri} |") ```
slot uri
Dataset_name schema:name
Dataset_title schema:title
Dataset_description schema:description
Dataset_doi bibo:doi
dataset__crc_project schema:ResearchProject
dataset__version schema:version
dataset__sample[organism] openminds:Species
dataset__sample[organism_part] openminds:UBERONParcellation
dataset__keywords schema:keywords
dataset__license schema:license
dataset__homepage schema:mainEntityOfPage
dataset__last_updated schema:dateModified
dataset__data_controller dpv:hasDataController
dataset__author schema:author
dataset__funding schema:funding
dataset__publication schema:citation
dataset__hasPart dcterms:hasPart
dataset__used_for prov:hadUsage

Three of them are types (or names of controlled terms), rather than attributes, in their respective vocabularies. I think that's bad (a property should be a property, and the type should be its range). These are:

How to annotate these properly?

I think one example we have is a data controller. We use the dpv:hasDataController property, and define a DataController class (that uses the dpv:DataController URI, but has a set of mandatory/optional fields defined by us).

Option 1: with linkML we can define any Dataset property we want (e.g. contributingProject, studiedSpecies, studiedOrganismPart or ) and define corresponding classes as their expected Range. The properties would then use our identifiers, and the classes would (likely) use those defined above (schema:ResearchProject, openminds:Species, openminds:UBERONParcellation), or at least refer to them as closely related.

Option 2: use a very generic property a'la "related to" from some standard vocabulary, and only use types to distinguish. So a dataset would be related to a given project, and a given species, etc. I think this is semantically valid, but ugly for processing.

Option 3: find the right properties in a known dictionary. I think it can be hard - at least I failed.

mslw commented 11 months ago

A snapshot from a whiteboard discussion, to refresh our memories:

PXL_20231213_131819670

Not included on the board: