Closed matentzn closed 3 months ago
Two more questions:
1) Do we want to represent unmappable literal mappings? If yes, how?
This is basically a question of how the “literal profile” and sssom:NoTermFound
should play together.
There are two distinct cases.
1.a) One side is a literal, the other side is sssom:NoTermFound
.
Unless I am missing something, this case should be straightforward and should not require any special treatment. That’s a good thing, because it’s the case that is most likely to be useful: if you have to map a list of literals (say, a list of scRNAseq cluster names) to a target vocabulary (say, the CL ontology), you want to be able to say that a given literal could not be mapped to any term in the target vocabulary.
1.b) The literal and the “term not found” are on the same side.
Say you want to map a vocabulary (again, let’s say CL) to a list of literals. You want to say that a given term in CL is not represented in the list of literals. We have a problem here, because the way to represent a “term not found” is to use sssom:NoTermFound
as the object_id
(assuming the literals are on the object side – it would not change anything if they were on the subject side), but object_id
is not used for a literal mapping (literals have no IDs, almost by definition).
So what to do here? I can imagine 3 possibilities:
1.b.1) We don’t support that case. Seems quite reasonable to me a priori, as I don’t think it is a very relevant use case (contrary to the 1.a case). That would be the current situation (with the profile as defined in this PR).
1.b.2) Use sssom:NoTermFound
as the object_id
, as we would do for non-literal mappings. This would mean that, even for literal mappings (where we are normally concerned only with object_label
), we would need to lookup object_id
just to check whether it contains the special sssom:NoTermFound
value. It leaves open the question of what to put in the object_label
slot in that case (we have to put something in that slot: it cannot be empty).
1.b.3) Put in the object_label
slot a special value similar to sssom:NoTermFound
, intended to indicate that there is no mapped literal. It could even be the value sssom:NoTermFound
itself (probably the easiest way).
2) Computing mapping_cardinality
for literal mappings
Not really a question, but when computing the mapping_cardinality
, if we are dealing with literal mappings, we must deal with the _label
slots rather than the _id
slots.
That is, for example: if we are dealing with a set of mappings with literals on the subject side and entities on the object side, then a cardinality of 1:n
means that one subject_label
(NOT one subject_id
) maps to several object_id
.
That is, I think, quite obvious, but it should still be mentioned explicitly somewhere.
Ok, I will fold that in (think about it first). As an aside, I like it if actionable comments to be added as comments on the diffs so that they show up as "resolveable", so that they also block PRs being merged unless they are explicitly resolved. If there is no good place for such a comment you can just put in on the Changelog diff or on any other file that makes a little bit of sense.
As an aside, I like it if actionable comments to be added as comments on the diffs so that they show up as "resolveable"
Duly noted. :)
The SSSOM-Py test suite fails on three tests:
test_utils.py::TestUtils::test_get_dict_from_mapping
fails because it expects a semantic_similarity_score
slot. This has nothing to do with this PR and is instead obviously the result of #386, which was not tested against SSSOM-Py before being merged.
test_parsers.py::TestParse::test_parse_alignment_xml
fails when parsing this:
<Cell>"
<entity1 rdf:resource="http://randomurlwithnochancetobeinprefixmap.org/ID_123"/>
<entity2 rdf:resource="http://purl.obolibrary.org/obo/WBbt_0005815"/>
<measure rdf:datatype="xsd:float">0.5</measure>
<relation>=</relation>
</Cell>
My understanding is that, after some checking against a prefix map, the entity1
ends up being being empty/None/null, and prior to that PR SSSOM-Py could count on the Mapping
constructor to automatically reject the mapping as invalid (since the Python generated code included a non-null check because of the required=true
attribute on subject_id
). With this PR, subject_id
is no longer required, so the mapping is “successfully” constructed – which is wrong, and the test catches that.
It looks like SSSOM-Py cannot simply rely on the LinkML runtime implementing the rules
set forth in this PR, so it will have to implement its own validation of subject_id
/ object_id
.
test_parsers.py::TestParse::test_parse_obographs
: I have not looked in details, but it’s most likely a similar problem as above.The issues above are putatively fixed by https://github.com/mapping-commons/sssom-py/pull/552.
Resolves #234
docs/
have been added/updated if necessarymake test
has been run locallymain
If you are proposing a change to the SSSOM metadata model, you must
examples/
see_also
field of the linkml modelsee_also
field of the linkml modelThis PR introduces an alternative model to the "literal mapping" proposal we have previously added. It is built on the following principles:
I do understand that there are various opposing views on the need of a "literal" profile, but I think this super minimal intervention will satisfy both sides.
Huge thanks to @gouttegd who managed to steer this massive carrier ship after it has left the harbour. This is rarely successful and I am very happy we managed to make it!