Add new term MappingDerivation

matentzn commented 2 years ago

DerivedMapping:

Def: A matching process based on interpreting an existing mapping provided without an explicit semantic mapping predicate.

Example: An ad-hoc two column mapping provided by a research paper is used as a source to provide a semantic mapping (skos:exactMatch).

Motivation:

This happens often when translating mappings from non-SSSOM formats into SSSOM. We should recommend adding a comment to the mapping sets that describes how the predicate decision was made.

joeflack4 commented 1 year ago

@matentzn The motivation for adding MappingDerivation seems to come from situations where MappingJustification is ambiguous, as in your example.

Just touching on semapv:UnspecifiedMatching, using Stephanie's use case as an example, I think we don't have a choice but to use this one until we do some research and find out how OMOP created its mappings.

I just had one comment on semapv:UnspecifiedMatching. I looked some docs, and it is a sublcass of semapv:matching. I don't know 100% for sure, but I do think we can likely say that the OMOP matches were done algorithmically. Are you involved at all the Semantic Mapping Vocabulary? I would suggest to them that they have another term, semapv:UnspecifiedAlgorithmicMatching to disambiguate from curated situations, but IDK how useful this is.

As for MappingDerivation, I can provide my thoughts if you like, but it would be useful to see of an example of how it would be used in code. I understand the problem of unknown mapping justifications, but I don't know what MappingDerivation is supposed to do to help / add more information than would currently be provided by MappingJustification.

matentzn commented 1 year ago

Just touching on semapv:UnspecifiedMatching, using https://github.com/jhu-bids/TermHub/issues/140 as an example, I think we don't have a choice but to use this one until we do some research and find out how OMOP created its mappings.

With OMOP cooperating, this is not worth it. They should do it. Sick with semapv:UnspecifiedMatching for now.

but I do think we can likely say that the OMOP matches were done algorithmically.

You don't know that - in their presentations they claim they do quite a bit of manual curation! In any case, the distinction is mood. Chris is already annoyed semapv:UnspecifiedMatching even exists. He would have preferred mapping_justification to be optional instead. But at least this why, we have to admit our cluelessness!

matentzn commented 1 year ago

As for MappingDerivation, I can provide my thoughts if you like, but it would be useful to see of an example of how it would be used in code. I understand the problem of unknown mapping justifications, but I don't know what MappingDerivation is supposed to do to help / add more information than would currently be provided by MappingJustification.

The point is that you can say: this mapping set was derived from OMOP (mapping_provider). The only reason this is interesting is that you could go and try to figure out how the mapping source (provider) has build the mappings, if you so wanted to.

joeflack4 commented 1 year ago

I hear you, sounds good. Agree w/ Chris that it is annoying, but semapv:UnspecifiedMatching at least shows what we know we don't know, whereas if we leave it blank, it could be that the justification is known, but we just didn't add it. More information is better; looks like you agree.

Still not getting a good grasp of MappingDerivation. Based on what you said, it sounds to me like a boolean field. If it's derived, then mapping_provider can say 'OMOP', and mapping_derivation would be true. But that sounds redundant, so I think I don't understand MappingDerivation yet.

matentzn commented 1 year ago

mapping_provider can be used even if the justification is something other than MappingDerviation
The main use case for mapping derivation is this:
- Someone publishes a non-sssom mapping (e.g. a two column spreadsheet)
- You migrate it to SSSOM, but need to fill in a bunch of blanks, e.g. "predicate_id"
- MappingDerivation tells the user that the current mapping was derived from such such a non-SSSOM source, and that some level of interpretation had to happen to fill in the blanks for predicate_id (and potentially other metadata fields)

matentzn commented 1 year ago

@saubin78, if you have the time and energy, I would appreciate your thoughts about this idea. I like this because it does not only imply that we may not know how each mapping is justified, but it clearly states that the mapping is the consequence of a translation of some non-SSSOM mapping file into SSSOM, and people that want to increase their confidence in the mapping can look at the source. But, as always, I am not sure how useful this category is compared to UnspeciedMatching, which is simple to understand.

nichtich commented 1 year ago

Names are difficult. How about DerivedMatching or TransformedMatching to state that mappings were derived from or transformed from other mappings? This would also subsume the case of logically implied matching (e.g. if A broader B and B broader C then A broader C).

matentzn commented 1 year ago

@nichtich This is a good direction - I have a bit of an issue with the grammar here, because "matching" refers to the "process of determining the mapping", and the way you state it it sounds like the "matching process" is "derived"/"transformed". Any other suggestion in this direction?

nichtich commented 1 year ago

I'm not a native speaker but DerivedMapping (for individual mappings) or DerivationMatching (for the process) might fit. The existing LogicalReasoning could also be relevant, maybe reasoning is a special case of derivation.

matentzn commented 1 year ago

DerivedMapping sounds good to me!

matentzn commented 1 year ago

What about the definition @saubin78, @nichtich:

A matching process based on interpreting an existing mapping provided without an explicit semantic mapping predicate.

saubin78 commented 1 year ago

Hi. DerivedMapping is fine but the definition may better start with "a mapping that..." or "the result of a matching process..." - for a mapping is not a matching process ;-)

"UnspeciedMatching" can be used to state that the matching process is not know, which is a bit different and can be complementary to DerivedMapping.

matentzn commented 1 year ago

Here is a bit of an ontology modelling thing we should consider. I would like to continue to describe these justifications as "activities" that contribute to the establishment of a mapping. This makes it much more straight forward to map SSSOM to PROV (a justification is a prov:activity that contributes to the establishment of mapping (a prov:entity). Any other suggestion for a label maybe?

matentzn commented 1 year ago

Copy of a chat on Slack in the sssom channel:

@ernestojimenezruiz:

If not predicate is given, the process can be seen as "SemanticMatching" prediction itself, especially if there are false positives in the list (e.g., related terms, but not strong relationships).

@matentzn:

this is true, but Semantic matching is more broad than what what we need here. I think we should make it explicit that we made a concrete guess about the correct mapping predicate - this is not the same is trying to run another round of matching in my opinion..

@ernestojimenezruiz :

I agree, but the problem of guessing may be as complicated as the matching itself :

@matentzn:

The problem of guessing “correctly” is certainly as complicated as matching, but I think what happens in practice everywhere I look is that you just make a blanket assumption from eyeballing the mapping set, like “closeMatch” or “exactMatch” for everything, and then apply the mapping to your specific use case. Again, I am not too concerned if this is good or bad, hard or easy, or done quickly of with a propert matching pipeline in mind - all of these could merrit additional justification modelling work. I want to be able to say: this mapping is from NCIT, and we just assume the mapping is exact based on what we saw. But because we clearly say “this mappjng was derived making an assumption” the user can easily reject the mapping from their pipelines as “too unsafe”. I want to avoid people just adding “skos:exactMatch” everywhere with no way for downstream tooling to be able to weed such cases out

mapping-commons / semantic-mapping-vocabulary

Add new term MappingDerivation #1