Closed matentzn closed 2 years ago
In the Ontology Matching we typically use the term alignment for the set of mappings. And mapping for the individual element (in this case term-predicate-term tuple).
Ok great thanks!
So a "match" and a "mapping" are synonymous then?
I prefer mapping, but I would say they are synonyms.
Thanks Ernesto!
Hey, yes what I proposed is in line with Ernesto (and other papers). Alignment instead of set of mappings (which is the context for 'global')
Alignment in the way I understand it is a more specific term than mapping set; an alignment is a set of mappings with the goal of aligning two sources (two ontologies, two data dictionaries; two databases); a mapping set could be any combination of mappings, even if the goal is not alignment (this sort of implies terminological mappings like exact, broad; narrow), for example: phenotype to disease, disease to anatomy. These kinds of mappings are explicitly in scope for sssom; what do you think?
In "On Evaluating Schema Matching and Mapping", a highly cited paper in the schema matching community, these distinctions are made:
I don't know yet wether I want to factor this into the thought process here, but I am starting to think that it would have been more accurate to pitch SSSOM as a matching rather than mapping vocabulary, but that ship has sailed. Remains to to determine how to best communicate this. The way the SSSOM data model works, a single entry, despite the name "mapping", in essence, describes a "match" in the above sense. A match can have certain properties like confidence and provenance.
This is very relevant to decide what the content of a single row in an SSSOM file is: this is clearly a match, if we want to provide multiple lines of evidence for a single mapping in the same table.
Language choices are overdetermined. So the terminology should not drive the decision about what functions we need to support.
As someone who has used the word 'mapping' for 20 years when what I am doing is creating maps between terms (not always exact matches between terms), I am relatively immune to one paper as an 'authority' about definitions of commonly used terms. That paper may be highly cited in 'schema matching' (that tries to automatically match the whole pattern), but that's not the practical work that I've worked on.
So I fully agree with @matentzn's comment above, quoted below. Multiple lines of evidence can be relevant to a single (what I call) mapping between two terms, whatever resources they are in.
Alignment in the way I understand it is a more specific term than mapping set; an alignment is a set of mappings with the goal of aligning two sources (two ontologies, two data dictionaries; two databases); a mapping set could be any combination of mappings, even if the goal is not alignment (this sort of implies terminological mappings like exact, broad; narrow), for example: phenotype to disease, disease to anatomy. These kinds of mappings are explicitly in scope for sssom; what do you think?
Thank you @graybeal. One conceptual problem I am wrestling with is the fact that in SSSOM, we call the first class citizen a "mapping", while in fact, the first class citizen is a "mapping evidence". I understand from your comment that you may find it a bit academic to distinguish, but it has some ramifications for what a single row record contains. For me:
Basically just for my own peace of mind, since a record in an SSSOM mapping set is a tuple:
<subject, predicate, object, match_type>
What is the best way to call this tuple? In my understanding this is more precisely referred to as a match than a map. If you think I am overthinking this, I am fine to leave my doubts behind, but if so, I need an answer to the following question (just for my communication toolbox):
How many mappings are in the following mapping set:
subject_id | object_id | predicate_id | match_type | mapping_provider |
---|---|---|---|---|
A:1 | B:2 | skos:exactMatch | HumanCurated | MONDO |
A:1 | B:2 | skos:exactMatch | Lexical | Bioportal |
If 1, would you still call a single row a mapping?
We did a tuple like that 15 years ago on MMI ORR project, we called it a mapping. (The 4th item in our case was a confidence percentage, but same idea.) Now, at the time I was not at all versed in the literature, but no one ever argue with calling it a mapping. I would call the 4th entity a mapping_type, by the way, not a match_type, because it is the type assigned to the first three things, which formed a mapping. (Either is fine, just be consistent.)
I would call your tabular example 2 mappings, because there are two activities that took place. Hence sentences will result like "I'm sure that mapping is correct because there are 6 mappings that support it!" And they way I would fix that (just because you'll get a laugh out of it) is to say "I'm sure that match is correct because there are 6 mappings that support it!" Yes, I really do think of the mapping as the atomic thing, and the match as the final result of all those mappings.
As an aside, or maybe not: Imagine you have a tool called
If I had to distinguish between your two options without saying match, I might say there are atomic mappings on the one hand, and final mappings, or consensus mappings, or concluded mappings, or declared mappings, on the other. Because of course given different criteria, two such tools might reach different conclusions. There is no end to the chain, alas.
@graybeal While I do not share your distinctions between mappings and matches (I have the opposite notion in my head), I think everything you say makes so much sense (and makes so much less work for me) that I am inclined to simply adopt your way of thinking and discard mine. Let me think this over and I will come back to you.
On the subject of match_type
. I also prefer, if we go your way, with the idea of renaming this to mapping_type
. However, I have recently started a major effort to align SSSOM with PROV, and I was wondering to reconceptualise this column as a "prov activity" that "confirms" the match <s,p,o> (as you say). What do you think of that? If we go that way, we can basically say that there is one match and one or more activities that confirms it, so we would have this terminology:
match
mapping
m
is a mapping activity
. This mapping activity
can be anything from lexical match
, human curation
, logical match
aggregation
(a combination of multiple lines of evidence into one, like your consensus
idea), derived match
(if it was obtained from some non SSSOM source). I don't like the name mapping acticity
obviously, but there are others, better ones like mapping rule
, even mapping type
as you say that could express the same logic. But maybe you have a better idea?What do you think of this idea? I was worried to mention it but I really like this logic.
cc @udp
Thanks for the nice words, here are some additional thoughts to the latest as your reward :-)
I don't know for sure but I think <s,p,o> can include negative assertions as well as possible, e.g., these two properties are determined to not be the same, or to be distinct. This makes me uncomfortable calling them a match (which to me implies matched meaning), hence my preference for mapping. (Which also comes from SSSOM's meaning…but I guess you could change the expansion and still have SSSOM :-). )
The addition of 'm' turns the triple into a provenance-enhanced version of its former self. It provides the 'how' about the creation of <s,p,o>. So from a 'meaningful label' standpoint, I'd call it a 'contextualized mapping' / 'contextualized match', or even a 'documented ma…' Whatever you call the triple you could use the same word in the quad, but qualified.
In English I would call m a 'mapping rationale', 'mapping justification', 'mapping method', or 'mapping reason'. It is often an activity but often it is just descriptive not active, so I don't think activity is fully warranted.
Thanks a ton @graybeal, this makes sense. Based on your suggestions, I hope reflecting them with more or less ok, I made this suggestion: #150
Let me know what you think.
I will close this now. We have continued the debate here: https://github.com/mapping-commons/sssom/discussions/169
Which culminated in our first version of https://github.com/mapping-commons/semantic-mapping-vocabulary (not published yet).
@lltommy mentioned that I might have been using the terminology a bit out of step with the community across the spec;
@ernestojimenezruiz @cmungall is this correct:
1) we should call the term-predicate-term tuples
matches
2) amapping
is a set of matches from one term set to another. A term set could be, for example, the signature of a branch in the ontology.