psychoinformatics-de / datalad-concepts

Other
3 stars 2 forks source link

`rdf:type` vs `dlthing:meta_type` in the context of LinkML type designation #176

Open jsheunis opened 4 months ago

jsheunis commented 4 months ago

The issue https://github.com/psychoinformatics-de/shacl-vue/issues/32 in shacl-vue brought to light that data converted from YAML to TTL format using the current state of thesdd schema (which inherits from distribution, thing, and more in dlco) does not contain the expected type designations.

Demonstrative example

With linkml 1.8.1:

The thing schema shows for meta_type and type:

https://github.com/psychoinformatics-de/datalad-concepts/blob/b2fcae84bd2b6062701fac9b95946f60c8bd4365/src/thing/unreleased.yaml#L192-L209

and the input data shows the following for one of the authors, note that there is no type specified:

https://github.com/psychoinformatics-de/datalad-concepts/blob/b2fcae84bd2b6062701fac9b95946f60c8bd4365/src/sdd/unreleased/examples/Distribution-penguins.yaml#L156-L168

We can then convert the YAML to TTL using:

>> linkml-convert -s src/sdd/unreleased.yaml -t ttl --target-class Distribution src/sdd/unreleased/examples/Distribution-penguins.yaml > distribution-penguins.ttl

The output for the same author after running linkml-convert is:

<https://example.org/ns/dataset/#ahorst> dldist:affiliation <https://example.org/ns/dataset/#UCSB> ;
    dldist:email "ahorst@example.com"^^dldist:EmailAddress ;
    dlthing:identifier [ a dlthing:Identifier ;
            dlthing:notation "0000-0002-6047-5564" ;
            dlthing:schema_agency <https://orcid.org> ] ;
    dlthing:meta_type "dldist:Person"^^xsd:anyURI ;
    dlthing:name "Allison Horst" ;
    dlthing:same_as "https://orcid.org/0000-0002-6047-5564"^^xsd:anyURI .

Note that the output does not contain the expected

<https://example.org/ns/dataset/#ahorst> a "dldist:Person"^^xsd:anyURI ;

which is the problem.

Discussion

The dlthing:meta_type slot was implemented in order to allow validation of data according to a specialized schema (indicated by the meta_type) where the range of the property accepting the data object is actually a super-class of the specialized one. (I couldn't find a more intuitive way of stating this....)

For example, let's say a Distribution has a was_attributed_to field (aka property) with range/type dlco:Agent, while dlco:Agent has multiple subclasses such as dlco:Person or dlco:Organization. This means the data object can pass through a dlco:Person or dlco:Organization and it should pass LinkML validation, as long as these are specified in the meta_type field of the data object and as long as these are actually subclasses of the accepting slot's range class.

However, the dlthing:meta_type specification does not really have meaning outside of the process of LinkML-based data validation. E.g. when data is exported to TTL and then used by shacl-vue, such an application is interested in the nodes and their types, such as the currently missing:

<https://example.org/ns/dataset/#ahorst> a "dldist:Person"^^xsd:anyURI ;

It is only when data generated/updated by an application such as shacl-vue wants to be validated in LinkML against the dlco-based schemas that the meta_type becomes important again.

After discussions with @mih, several points were raised:

Investigating rdf:type as slot_uri of dlthing:meta_type

I tried this by updating the line: https://github.com/psychoinformatics-de/datalad-concepts/blob/b2fcae84bd2b6062701fac9b95946f60c8bd4365/src/thing/unreleased.yaml#L193

diff --git a/src/thing/unreleased.yaml b/src/thing/unreleased.yaml
index 486fbb2..d010834 100644
--- a/src/thing/unreleased.yaml
+++ b/src/thing/unreleased.yaml
@@ -190,7 +190,7 @@ slots:
     range: string

   meta_type:
-    slot_uri: dlthing:meta_type
+    slot_uri: rdf:type
     designates_type: true
     description: >-
       Type designator of a metadata object for validation and schema structure

changing nothing in the data (i.e. the data object still specifies the meta_type field, and not the type field), and then running the conversion code again.

This was the output:

<https://example.org/ns/dataset/#ahorst> a "dldist:Person"^^xsd:anyURI ;
    dldist:affiliation <https://example.org/ns/dataset/#UCSB> ;
    dldist:email "ahorst@example.com"^^dldist:EmailAddress ;
    dlthing:identifier [ a dlthing:Identifier ;
            dlthing:notation "0000-0002-6047-5564" ;
            dlthing:schema_agency <https://orcid.org> ] ;
    dlthing:name "Allison Horst" ;
    dlthing:same_as "https://orcid.org/0000-0002-6047-5564"^^xsd:anyURI .

The difference compared to the initial output:

I also ran checks and validations locally after the change, with no unexpected errors.

Is this what we want?