mapping-commons / sssom

Simple Standard for Sharing Ontology Mappings
https://mapping-commons.github.io/sssom/
BSD 3-Clause "New" or "Revised" License
154 stars 24 forks source link

Reorganise docs #368

Closed gouttegd closed 4 months ago

gouttegd commented 5 months ago

Resolves [#330]

This PR reorganises the documentation, especially the specification part, as suggested in #330.

More precisely:

The “resources for users” section is left untouched for now. The urgent part was reorganising the specification, so that we can start enriching it to make it ready for 1.0.

gouttegd commented 4 months ago

Given that the “reorganisation of the docs” part does not actually change any content (it merely puts the doc in a shape that it will make it easier to work on it), I am fine with that part not being reviewed by a second reviewer.

For the spec-formats-tsv.md part, however, I don’t think it is fitting that the first real formal specification of the SSSOM/TSV format has been written by the developer of only one of the two “major” implementations. I’d like a SSSOM-Py developer to have at least a cursory look at it.

I don’t foresee any problem since the new spec should be fully compatible with existing behaviours in SSSOM-Py. What the new spec does add:

matentzn commented 4 months ago

I feel myself responsible for the sssom py implementation, even though the majority of the work has been done by @hrshdhgd.

@hrshdhgd - feel free to review the file called spec-formats-tsv.md in this PR, with a specific emphasis on the points @gouttegd made in his last comment above. We will have to implement condensation and propagation at some point soon after, so it is in any case good if you are familiar with it. Let us know if you have any major qualms!

Thanks!

gouttegd commented 4 months ago

@hrshdhgd

I don't think I follow the condensation and propagation concepts though. Could either of you provide examples so I understand what to implement?

Let’s consider the following set:

#curie_map:
#  COMENT: https://example.com/entities/
#  ORGENT: https://example.org/entities/
#mapping_provider: https://example.org/provider
#mapping_tool: foo mapper
subject_id    subject_label   predicate_id      object_id     object_label   mapping_justification          mapping_tool
ORGENT:0001   alice           skos:closeMatch   COMENT:0011   alpha          semapv:ManualMappingCuration   
ORGENT:0002   bob             skos:closeMatch   COMENT:0012   beta           semapv:ManualMappingCuration   bar mapper
ORGENT:0004   daphne          skos:closeMatch   COMENT:0014   delta          semapv:ManualMappingCuration   
ORGENT:0005   eve             skos:closeMatch   COMENT:0015   epsilon        semapv:ManualMappingCuration   

The set-level metadata contain a value for the mapping_provider and mapping_tool slots. These slots are considered “propagatable“, which means that they really apply to individual mappings, and that putting them at the level of the set is just a “shortcut” to avoid repeating the same value for all mappings.

So in this example, all mappings should be considered to have a mapping_provider of https://example.org/provider.

Propagation is the act of taking the values of propagatable slots at the set level, and filling the corresponding slots in each individual mappings.

After propagation, the above set should look like this:

#curie_map:
#  COMENT: https://example.com/entities/
#  ORGENT: https://example.org/entities/
#mapping_tool: foo mapper
subject_id    subject_label   predicate_id      object_id     object_label   mapping_justification          mapping_tool   mapping_provider
ORGENT:0001   alice           skos:closeMatch   COMENT:0011   alpha          semapv:ManualMappingCuration                  https://example.org/provider
ORGENT:0002   bob             skos:closeMatch   COMENT:0012   beta           semapv:ManualMappingCuration   bar mapper     https://example.org/provider
ORGENT:0004   daphne          skos:closeMatch   COMENT:0014   delta          semapv:ManualMappingCuration                  https://example.org/provider
ORGENT:0005   eve             skos:closeMatch   COMENT:0015   epsilon        semapv:ManualMappingCuration                  https://example.org/provider

Notice that that the set no longer has a mapping_provider value, and conversely that all mappings have one.

Also note that the value of the mapping_tool at the set level (“foo mapper”) has not been propagated, even though mapping_tool is a propagatable slot. This is because one of the mappings already had a value for that slot (mapping #2, which has the value “bar mapper“), and propagation is only allowed when no mappings at all have a value for the propagatable slot.

gouttegd commented 4 months ago

Condensation is the exact opposite of propagation. It’s taking the values of “propagatable slots” that are set on the mappings, and moving them (if possible, that is if all mappings have the same value) to the level of the set instead.

For example, to condense the second example from my previous message, you would observe that all mappings have the same value for the mapping_provider slot, so you would set a single mapping_provider slot at the level of the set and remove the entire mapping_provider column. You would also observe that not all mappings have the same value for mapping_tool (one mapping has the value “bar mapper“, whereas other mappings have no value), so you would not do anything special for that slot (it is not condensable).

hrshdhgd commented 4 months ago

That makes perfect sense! Thank you for explaining this patiently and perfectly @gouttegd ! I truly appreciate it.