Closed gouttegd closed 4 months ago
Given that the “reorganisation of the docs” part does not actually change any content (it merely puts the doc in a shape that it will make it easier to work on it), I am fine with that part not being reviewed by a second reviewer.
For the spec-formats-tsv.md
part, however, I don’t think it is fitting that the first real formal specification of the SSSOM/TSV format has been written by the developer of only one of the two “major” implementations. I’d like a SSSOM-Py developer to have at least a cursory look at it.
I don’t foresee any problem since the new spec should be fully compatible with existing behaviours in SSSOM-Py. What the new spec does add:
I feel myself responsible for the sssom py implementation, even though the majority of the work has been done by @hrshdhgd.
@hrshdhgd - feel free to review the file called spec-formats-tsv.md
in this PR, with a specific emphasis on the points @gouttegd made in his last comment above. We will have to implement condensation and propagation at some point soon after, so it is in any case good if you are familiar with it. Let us know if you have any major qualms!
Thanks!
@hrshdhgd
I don't think I follow the condensation and propagation concepts though. Could either of you provide examples so I understand what to implement?
Let’s consider the following set:
#curie_map:
# COMENT: https://example.com/entities/
# ORGENT: https://example.org/entities/
#mapping_provider: https://example.org/provider
#mapping_tool: foo mapper
subject_id subject_label predicate_id object_id object_label mapping_justification mapping_tool
ORGENT:0001 alice skos:closeMatch COMENT:0011 alpha semapv:ManualMappingCuration
ORGENT:0002 bob skos:closeMatch COMENT:0012 beta semapv:ManualMappingCuration bar mapper
ORGENT:0004 daphne skos:closeMatch COMENT:0014 delta semapv:ManualMappingCuration
ORGENT:0005 eve skos:closeMatch COMENT:0015 epsilon semapv:ManualMappingCuration
The set-level metadata contain a value for the mapping_provider
and mapping_tool
slots. These slots are considered “propagatable“, which means that they really apply to individual mappings, and that putting them at the level of the set is just a “shortcut” to avoid repeating the same value for all mappings.
So in this example, all mappings should be considered to have a mapping_provider
of https://example.org/provider
.
Propagation is the act of taking the values of propagatable slots at the set level, and filling the corresponding slots in each individual mappings.
After propagation, the above set should look like this:
#curie_map:
# COMENT: https://example.com/entities/
# ORGENT: https://example.org/entities/
#mapping_tool: foo mapper
subject_id subject_label predicate_id object_id object_label mapping_justification mapping_tool mapping_provider
ORGENT:0001 alice skos:closeMatch COMENT:0011 alpha semapv:ManualMappingCuration https://example.org/provider
ORGENT:0002 bob skos:closeMatch COMENT:0012 beta semapv:ManualMappingCuration bar mapper https://example.org/provider
ORGENT:0004 daphne skos:closeMatch COMENT:0014 delta semapv:ManualMappingCuration https://example.org/provider
ORGENT:0005 eve skos:closeMatch COMENT:0015 epsilon semapv:ManualMappingCuration https://example.org/provider
Notice that that the set no longer has a mapping_provider
value, and conversely that all mappings have one.
Also note that the value of the mapping_tool
at the set level (“foo mapper”) has not been propagated, even though mapping_tool
is a propagatable slot. This is because one of the mappings already had a value for that slot (mapping #2, which has the value “bar mapper“), and propagation is only allowed when no mappings at all have a value for the propagatable slot.
Condensation is the exact opposite of propagation. It’s taking the values of “propagatable slots” that are set on the mappings, and moving them (if possible, that is if all mappings have the same value) to the level of the set instead.
For example, to condense the second example from my previous message, you would observe that all mappings have the same value for the mapping_provider
slot, so you would set a single mapping_provider
slot at the level of the set and remove the entire mapping_provider
column. You would also observe that not all mappings have the same value for mapping_tool
(one mapping has the value “bar mapper“, whereas other mappings have no value), so you would not do anything special for that slot (it is not condensable).
That makes perfect sense! Thank you for explaining this patiently and perfectly @gouttegd ! I truly appreciate it.
Resolves [#330]
docs/
have been added/updated if necessarymake test
has been run locallyThis PR reorganises the documentation, especially the specification part, as suggested in #330.
More precisely:
spec.md
document)The “resources for users” section is left untouched for now. The urgent part was reorganising the specification, so that we can start enriching it to make it ready for 1.0.