Closed matentzn closed 1 year ago
As per this , I'm guessing permissible_values
should be the value of meaning
(which I know will not fit well) but I'll let Sierra answer since she has more vast knowledge about linkml in general than me.
My guess is something like:
entity_type_enum:
permissible_values:
"owl:Class":
"owl:ObjectProperty":
"owl:DataProperty":
"owl:AnnotationProperty":
"owl:NamedIndividual":
"skos:Concept":
"rdfs:Resource":
"rdfs:Class":
"rdfs:Literal":
"rdfs:Datatype":
"rdf:Property":
But then it'll be considered as a string rather than a uriorcurie
Right @hrshdhgd - after looking at the code, I don't think rdfgen will automatically expand those strings to URIs. Is that ok? There is a bit of code in owlgen that we probably want to add to rdfgen.
I don't think rdfgen will automatically expand those strings to URIs. Is that ok?
We would want it to be URIs.
may be useful, Nico?
Thank you both for your support. Before we look into solutions for the problem, I would like to understand what the design intention is here on the LinkML side.
The question is simply: can an instance of an enum (a value) be an entity reference (uriorcurie
), or not?
So if I have
subject_id | .... | subject_type |
---|---|---|
HP:0000118 | ... | owl:Class |
or
"mappings": [ {
"subject_id": "HP:0000118"
"subject_type": "owl:Class"
} ]
Can the enum that restricts subject_type
be defined at all to be a uriorcurie
instance so that if I translate into RDF, I get:
[] a sssom:Mapping;
sssom:subject_type <http://www.w3.org/2002/07/owl#Class> ;
sssom:subject_id <http://purl.obolibrary.org/obo/HP_0000118> .
I think so yeah; it's just that not all generators are feature-complete. The OWL generator code that I and Harshad linked is the implementation that I think we need to add to rdfgen (you are using rdfgen in SSSOM right?)
Thank you. It is good to hear that it is not a conceptual issue. So forgetting about RDF for now - can I somehow specify the base type of the enum? Stating that it is an instance of say, uri or curie or is this currently not possible in the spec?
No, we can't specify the range of a permissible value at this point, but I like the idea. "meaning" range is restricted to a uriorcurie however, so we can assume that the meaning can be translated in a constrained kind of way based on the serialization.
@gouttegd can you articulate your position on this? You prefer "owl class" as the rendering of the a lot value in TSV and JSON?
@matentzn It’s not a matter of what I prefer.
First, I think such a breaking change at this stage, so close to 1.0, would be harmful, for little to no benefit, especially since what you seem to want should be achievable without the need for a breaking change (more on that latter).
Consider that it has already been more than one year since match_type
and the associated enum was replaced by mapping_justification
and the SEMAPV vocabulary. There has been 9 releases of the SSSOM schema since then. And yet as of today we still find mapping sets that use either the old slot and/or the old enum.
I don’t know why you prefer
subject_id ... subject_type
HP:0000118 ... owl:Class
over
subject_id ... subject_type
HP:0000118 ... owl class
but it is almost certain the second form is going to still be present in the wild for the years to come. So I’d recommend not changing the enum now.
What seems to really bother you is that you would like the enum value to be rendered as an IRI when serialising to RDF, am I correct?
Well then, just amend the spec to state that, when serialising to RDF, a value of type entity_type_enum
should be serialised as the IRI indicated in the meaning
field associated with the value (so, owl class
should be serialised as http://www.w3.org/2002/07/owl#Class
, etc.). That’s it. No need to break anything.
Arguably this could even be specified once and for all at the LinkML level, not only for SSSOM and not only for entity_type_enum
: When serialising to RDF, if the permissible values of an enum have a meaning
field, then values of that enum should not be serialised as the string representation of the value but as whatever entity is referenced in the meaning
field for each value.
This would seem to me like a perfectly reasonable behaviour, and actually the behaviour that I would expect – because otherwise I quite don’t see the point of the meaning
field – the doc says it “allows enums to be backed by external ontologies“, but what does that mean exactly?
But my real concern is with this:
The question is simply: can an instance of an enum (a value) be an entity reference (uriorcurie), or not?
Please don’t. An enum value is of the type of that particular enum, nothing more, nothing less. Each language has its own way of representing enumerations – in particular, in many languages, they are integers. Specifying straight in the data model that enum values should be of a certain type is only going to make supporting SSSOM in non-Python languages even more complicated.
Basically the point I am trying to make is: let enum values be opaque values and if you want some of them to be serialised in a certain way, enforce that at the level of the parser/serialiser, not at the level of the data model.
@gouttegd is correct. And in fact the current behavior for rdf serialization is to use the meaning URI if present. json and python use the text. This needs to be clarified in the docs: https://linkml.io/linkml/schemas/enums.html#mapping-permissible-values-to-ontologies
We discussed this on the linkml. We weren't totally clear if this pertained to schema or data. Note rdfgen, which was mentioned (and all the *gen
s) are for schema conversion. However, I think it's actually about data conversion (linkml-convert
) - is that right?
Either way: meaning
, if present, is used to generate a URI, whether representing PVs in a schema, or data
Ok. Let's leave things as they are then. I have slept a few nights now on the matter, and am fine with not making a change. I guess I will see what that means for the Json serialisation, because I long forgot checking if it is based on JSON-LD (which means I would expect it roundtrips with RDF) or some other standard representation.
Currently the
subject_type
andobject_type
fields are defined using an enum.When I added this, I intended to be able to use this enum like this:
Now I learn, that, according to the LinkML model, this is how it would look like on data level:
I still need to find our how to define an enum that takes exactly owl:Class, owl:ObjectProperty, owl:AnnotationProperty as values, but in a way that it is understood that these are curies?
So when I translate say this dataset:
into RDF, that I get:
Help @sierra-moxon @hrshdhgd