Terminology: Match for term-term, mapping for termset-termset

matentzn commented 4 years ago

@lltommy mentioned that I might have been using the terminology a bit out of step with the community across the spec;

@ernestojimenezruiz @cmungall is this correct:

1) we should call the term-predicate-term tuples matches 2) a mapping is a set of matches from one term set to another. A term set could be, for example, the signature of a branch in the ontology.

ernestojimenezruiz commented 4 years ago

In the Ontology Matching we typically use the term alignment for the set of mappings. And mapping for the individual element (in this case term-predicate-term tuple).

matentzn commented 4 years ago

Ok great thanks!

So a "match" and a "mapping" are synonymous then?

ernestojimenezruiz commented 4 years ago

I prefer mapping, but I would say they are synonyms.

matentzn commented 4 years ago

Thanks Ernesto!

LLTommy commented 4 years ago

Hey, yes what I proposed is in line with Ernesto (and other papers). Alignment instead of set of mappings (which is the context for 'global')

matentzn commented 4 years ago

Alignment in the way I understand it is a more specific term than mapping set; an alignment is a set of mappings with the goal of aligning two sources (two ontologies, two data dictionaries; two databases); a mapping set could be any combination of mappings, even if the goal is not alignment (this sort of implies terminological mappings like exact, broad; narrow), for example: phenotype to disease, disease to anatomy. These kinds of mappings are explicitly in scope for sssom; what do you think?

matentzn commented 2 years ago

In "On Evaluating Schema Matching and Mapping", a highly cited paper in the schema matching community, these distinctions are made:

Matching is the process that takes as input two schemas, referred to as the source and the target, and produces a number of matches, a.k.a., correspondences, between the elements of these two schemas (Rahm and Bernstein, 2001).
Given a source and a target schema, a mapping is a relationship, i.e., a constraint, that must hold between their respective instances.
In contrast to matches, that specify how instance values of individual source and target schema elements relate to each other, a mapping additionally specifies how the values within the same instance relate to each other

I don't know yet wether I want to factor this into the thought process here, but I am starting to think that it would have been more accurate to pitch SSSOM as a matching rather than mapping vocabulary, but that ship has sailed. Remains to to determine how to best communicate this. The way the SSSOM data model works, a single entry, despite the name "mapping", in essence, describes a "match" in the above sense. A match can have certain properties like confidence and provenance.

This is very relevant to decide what the content of a single row in an SSSOM file is: this is clearly a match, if we want to provide multiple lines of evidence for a single mapping in the same table.

graybeal commented 2 years ago

Language choices are overdetermined. So the terminology should not drive the decision about what functions we need to support.

As someone who has used the word 'mapping' for 20 years when what I am doing is creating maps between terms (not always exact matches between terms), I am relatively immune to one paper as an 'authority' about definitions of commonly used terms. That paper may be highly cited in 'schema matching' (that tries to automatically match the whole pattern), but that's not the practical work that I've worked on.

So I fully agree with @matentzn's comment above, quoted below. Multiple lines of evidence can be relevant to a single (what I call) mapping between two terms, whatever resources they are in.

Alignment in the way I understand it is a more specific term than mapping set; an alignment is a set of mappings with the goal of aligning two sources (two ontologies, two data dictionaries; two databases); a mapping set could be any combination of mappings, even if the goal is not alignment (this sort of implies terminological mappings like exact, broad; narrow), for example: phenotype to disease, disease to anatomy. These kinds of mappings are explicitly in scope for sssom; what do you think?

matentzn commented 2 years ago

Thank you @graybeal. One conceptual problem I am wrestling with is the fact that in SSSOM, we call the first class citizen a "mapping", while in fact, the first class citizen is a "mapping evidence". I understand from your comment that you may find it a bit academic to distinguish, but it has some ramifications for what a single row record contains. For me:

subject, predicate, object <- is a mapping
match_type <- is a piece of evidence for a mapping (this could be an exact match on labels, a human curated mapping based on review, a complex mapping generated by LogMap).

Basically just for my own peace of mind, since a record in an SSSOM mapping set is a tuple:

<subject, predicate, object, match_type>

What is the best way to call this tuple? In my understanding this is more precisely referred to as a match than a map. If you think I am overthinking this, I am fine to leave my doubts behind, but if so, I need an answer to the following question (just for my communication toolbox):

How many mappings are in the following mapping set:

subject_id	object_id	predicate_id	match_type	mapping_provider
A:1	B:2	skos:exactMatch	HumanCurated	MONDO
A:1	B:2	skos:exactMatch	Lexical	Bioportal

1 (with two lines of evidence)
2

If 1, would you still call a single row a mapping?

graybeal commented 2 years ago

We did a tuple like that 15 years ago on MMI ORR project, we called it a mapping. (The 4th item in our case was a confidence percentage, but same idea.) Now, at the time I was not at all versed in the literature, but no one ever argue with calling it a mapping. I would call the 4th entity a mapping_type, by the way, not a match_type, because it is the type assigned to the first three things, which formed a mapping. (Either is fine, just be consistent.)

I would call your tabular example 2 mappings, because there are two activities that took place. Hence sentences will result like "I'm sure that mapping is correct because there are 6 mappings that support it!" And they way I would fix that (just because you'll get a laugh out of it) is to say "I'm sure that match is correct because there are 6 mappings that support it!" Yes, I really do think of the mapping as the atomic thing, and the match as the final result of all those mappings.

As an aside, or maybe not: Imagine you have a tool called er that goes through collections of individual mappings between 2 terms and tries to be the authority about what if any relationship should be declared for a particular purpose. What would we call that thing? I wouldn't call it the Mapper, because something else has done all the mappings. I could call it the Matcher (or the Decider).

If I had to distinguish between your two options without saying match, I might say there are atomic mappings on the one hand, and final mappings, or consensus mappings, or concluded mappings, or declared mappings, on the other. Because of course given different criteria, two such tools might reach different conclusions. There is no end to the chain, alas.

matentzn commented 2 years ago

@graybeal While I do not share your distinctions between mappings and matches (I have the opposite notion in my head), I think everything you say makes so much sense (and makes so much less work for me) that I am inclined to simply adopt your way of thinking and discard mine. Let me think this over and I will come back to you.

On the subject of match_type. I also prefer, if we go your way, with the idea of renaming this to mapping_type. However, I have recently started a major effort to align SSSOM with PROV, and I was wondering to reconceptualise this column as a "prov activity" that "confirms" the match <s,p,o> (as you say). What do you think of that? If we go that way, we can basically say that there is one match and one or more activities that confirms it, so we would have this terminology:

<s,p,o> is a match
<s,p,o,m> is a mapping
m is a mapping activity. This mapping activity can be anything from lexical match, human curation, logical match aggregation (a combination of multiple lines of evidence into one, like your consensus idea), derived match (if it was obtained from some non SSSOM source). I don't like the name mapping acticity obviously, but there are others, better ones like mapping rule, even mapping type as you say that could express the same logic. But maybe you have a better idea?

What do you think of this idea? I was worried to mention it but I really like this logic.

cc @udp

graybeal commented 2 years ago

Thanks for the nice words, here are some additional thoughts to the latest as your reward :-)

I don't know for sure but I think <s,p,o> can include negative assertions as well as possible, e.g., these two properties are determined to not be the same, or to be distinct. This makes me uncomfortable calling them a match (which to me implies matched meaning), hence my preference for mapping. (Which also comes from SSSOM's meaning…but I guess you could change the expansion and still have SSSOM :-). )

The addition of 'm' turns the triple into a provenance-enhanced version of its former self. It provides the 'how' about the creation of <s,p,o>. So from a 'meaningful label' standpoint, I'd call it a 'contextualized mapping' / 'contextualized match', or even a 'documented ma…' Whatever you call the triple you could use the same word in the quad, but qualified.

In English I would call m a 'mapping rationale', 'mapping justification', 'mapping method', or 'mapping reason'. It is often an activity but often it is just descriptive not active, so I don't think activity is fully warranted.

matentzn commented 2 years ago

Thanks a ton @graybeal, this makes sense. Based on your suggestions, I hope reflecting them with more or less ok, I made this suggestion: #150

Let me know what you think.

matentzn commented 2 years ago

I will close this now. We have continued the debate here: https://github.com/mapping-commons/sssom/discussions/169

Which culminated in our first version of https://github.com/mapping-commons/semantic-mapping-vocabulary (not published yet).

mapping-commons / sssom

Terminology: Match for term-term, mapping for termset-termset #10