Using mappings to replace obsolete terms

gouttegd commented 8 months ago

One possible use of a SSSOM mapping set is to perform mass renaming in a given database, ontology or other data vault. For example, given an ontology and a mapping set, if the IRI of an entity in the ontology matches the subject ID of a mapping in the set, then replace that IRI with the corresponding object ID.

Should we have a way to explicitly indicate that a mapping is intended to to be used for this kind of replacement? That is, instead of a mapping that merely indicates that the subject and the object are an “exact match”, we would have a mapping that explicitly indicates that the subject is to be replaced by the object – which is slightly different than saying than the subject and the object can be used interchangeably (the normal meaning of an exact match).

I can see three ways of making such a statement explicit:

a) Using IAO:0100001 (“term replaced by”) as the mapping predicate. That’s the easiest way as it does not require anything that does not already exist.

b) Having a new dedicated mapping relation in SEMAPV, such as semapv:ReplacementTerm or similar. It would probably be a subproperty of skos:exactMatch.

c) Instead of using the mapping predicate, we use another field, probably sssom:MappingJustification. That is, the mapping predicate would remain skos:exactMatch, but the mapping justification would be a new value like semapv:TermReplacement or similar.

I have no strong opinion on which way would be better (though I slightly dislike c as I feel this is overloading the meaning of sssom:MappingJustification somehow). But I think it would be nice to have one recommended way of doing replacements with SSSOM, otherwise I am concerned that all three methods (and possibly other methods I have not thought of!) will end up being used in the wild.

matentzn commented 8 months ago

Very timely. My thoughts on this right now is as follows, but I am very easily convinced that I am wrong:

"Term replacement" is a use case of a mapping set, rather than a fundamental relationship between terms.
I would therefore export a SSSOM mapping set for the purpose of "term replacement", with no particular changes to the "mapping-level" metadata

That said, I would be open to arguments against the above. The main ones I can see is:

we deliberately map to "obsolete" terms, which could confuse people
we map between terms in the same identifier space, which is also strange.

From your suggestions above, I would prefer we go with a) (no need to change anything), but still have a comment on mapping set level which explains the use case of the mapping somehow?

gouttegd commented 8 months ago

"Term replacement" is a use case of a mapping set, rather than a fundamental relationship between terms.

On principle I agree, but still it would be nice if a tool could consume a mapping set to perform replacement without requiring the users to first filter out the mappings that do not represent a term replacement (in case they somehow obtain a mapping set that contains more than what they need).

Practically I envision a tool (or a pluggable ROBOT command) that can take a mapping set as input and would automatically perform term replacements, using only the mappings that have been explicitly marked somehow as being intended for such a purpose.

Of course the behaviour would be user-configurable (users would be able to say, “perform replacements for all the mappings, regardless of which predicate or mapping justification they use”, or on the contrary, “perform replacements only for the mappings that are using the predicate custom:MyCustomPredicate), but the default behaviour would be to automatically use the mappings that use whichever of the three solutions above we agree on (e.g., only the mappings that use a IAO:0100001 predicate if we go for option a).

matentzn commented 8 months ago

Practically I envision a tool (or a pluggable ROBOT command) that can take a mapping set as input and would automatically perform term replacements, using only the mappings that have been explicitly marked somehow as being intended for such a purpose.

Replacements due to term deprecation are only a small sub-area of the problem space. Often, we want to replace ids in raw data described with 1 vocab (say MA) with another (say Uberon). These cases are at least as frequent, if not more frequent, then replacements due to deprecation. I am not saying "yes" or "no" or anything at all; just that the "term replacement" use case is divided into two categories:

Replacement due to deprecation
Replacement for the purpose of data integration

And since (2) is certainly going to use regular mapping relations, I am not 100% sure about the marginal gain of having (1) using a different "system".

My personal sense is still that the client / user should know what they are doing when passing a mapping set as input to a replacement problem.

gouttegd commented 8 months ago

And since (2) is certainly going to use regular mapping relations

I am not fully convinced that “regular mapping relations” (I am assuming you’re referring to things such as skos:exactMatch and the like?) should be used here, but OK.

No special system of any kind to mark mappings as being used for replacement purposes, then. Up to the users to decide which relations they want to use depending on what they are actually doing (replacing obsolete terms or integrating data from different vocabularies), and to pass that to whatever command or script they use to actually perform the replacement. Fine with me.

I’ll still make my command use IAO:0100001 by default though, because that covers the use case I care about. Users can change that if that doesn’t suit them.

matentzn commented 8 months ago

I am not fully convinced that “regular mapping relations” (I am assuming you’re referring to things such as skos:exactMatch and the like?) should be used here, but OK.

For KG integration having exact matches between say the Orphanet Ontology (ORDO) and a Mondo is the main use case of mapping sets.. A mapping set between the ORDO and Mondo would be used to "replace" all mentions of, say, ORDO with Mondo identifiers in a dataset imported during ETL.

As always I am certainly not trying to impose my will, just collecting arguments and hopefully zoning in on the right path together.

gouttegd commented 8 months ago

For KG integration having exact matches between say the Orphanet Ontology (ORDO) and a Mondo is the main use case of mapping sets..

I have two (admittedly small) concerns with using skos:exactMatch when the mappings are intended for replacement purposes (regardless of the reason for the replacement: whether it is to replace obsoleted terms or for data integration or any other reason).

A philosophical one (aka “the unimportant one“): saying that “A must be replaced by B” (again, regardless of the reason: “because A is obsolete”, “because my application only accepts entities from the vocabulary of B”, “because I prefer B”, etc.) is not the same thing as saying “A and B refers to the same thing and can be used interchangeably“, which is the definition of an exact match according to the SKOS vocabulary. Using skos:exactMatch for “A must be replaced by B” seems to me like an overloading of the meaning of that relation, and is basically the same mistake that we did with oboInOwl:hasDbXref (which ended up being using for everything beyond “database cross-references“, thereby losing any meaning).

A practical one: skos:exactMatch is directly “invertible”, meaning that inverting a mapping requires nothing more than inverting the subject and the object (and the associated metadata of course, but we can leave that aside for this discussion), without having to change the predicate. In other words, A skos:exactMatch B implies B skos:exactMatch A. That doesn’t seem desirable when the purpose of the mapping is to specify replacements. “A must be replaced by B” quite obviously does not imply that “B must be replaced by A”; instead, it implies that “B must replace A” – the predicate must change when you invert the mapping, if you don’t want to lose the meaning.

Now you would tell me if that it is up to the users to know what they are doing, and to be careful when inverting mapping sets if they know that the mapping set will ultimately be used for replacement purposes – and you would clearly have a point!

But I still think it’d be safer (and semantically more precise) to have a dedicated relation to represent mappings intended for entity replacement. Or several dedicated relations, if we want to distinguish between “replacements due to deprecation” (where IAO:0100001 could be used) and “replacements for the purpose of data integration” (for which we could craft a new mapping relation in SEMAPV).

I am certainly not trying to impose my will

Neither do I, and sorry if I gave that impression.

matentzn commented 8 months ago

A practical one: skos:exactMatch is directly “invertible”, meaning that inverting a mapping requires nothing more than inverting the subject and the object (and the associated metadata of course, but we can leave that aside for this discussion), without having to change the predicate. In other words, A skos:exactMatch B implies B skos:exactMatch A. That doesn’t seem desirable when the purpose of the mapping is to specify replacements. “A must be replaced by B” quite obviously does not imply that “B must be replaced by A”; instead, it implies that “B must replace A” – the predicate must change when you invert the mapping, if you don’t want to lose the meaning.

This is a 100% valid reason and a very good argument to use IAO:0100001 for replacement that are not intended to be used in a bidirectional manner! Great thinking. This edged me more towards promoting the use of dedicated relationships at least in the case that the replacement is not supposed to be bi-directional!

gouttegd commented 8 months ago

This is a 100% valid reason

So you don’t consider my “philosophical” argument to be 100% valid? 😢

( :D )

matentzn commented 8 months ago

Hmm I can see where you are coming from, but I find the view impractical personally, even if it is conceptually justifiable. "Replacement" is a key technique in data integration, and "data integration" is such a fundamental use case for "mapping sets" that I don't really think its "overloading" to think of the match relation as a "permission to replace".

For the xref argument: Its not that overloarding skos:exactMatch this way will cause the same problem. I agree that skos exactMatch should only be used in the sense "they refer to the same real world concept", and any other use, no matter for what use case, I would consider invalid. So if you use it to mean: you can replace X with Y, but X and Y do not "refer to the same real world concept", I would say "its wrong".

You can use a "formally correctly defined mapping set" for the purpose of replacement, this is all I am trying to say. I am not trying to say that "the purpose of replacement" can redefine the semantics of the predicate.

gouttegd commented 8 months ago

I don't really think its "overloading" to think of the match relation as a "permission to replace"

Fair enough.

I am not convinced enough to make my tool automatically and silently replace any entity that is the subject of a skos:exactMatch mapping, though. If people want to do that, they will need to explicitly ask for it with a --predicate skos:exactMatch option.

matentzn commented 8 months ago

Yep, fair!

mapping-commons / semantic-mapping-vocabulary

Using mappings to replace obsolete terms #30