mapping-commons / sssom

Simple Standard for Sharing Ontology Mappings
https://mapping-commons.github.io/sssom/
BSD 3-Clause "New" or "Revised" License
154 stars 24 forks source link

Issues with constraints on the `mapping_justification` slot #316

Open gouttegd opened 1 year ago

gouttegd commented 1 year ago

The mapping_justification slot is defined in the LinkML model as follows:

mapping_justification:
  description: A mapping justification is an action (or the written representation of that action) of showiing a mapping to be right or reasonable.
  range: EntityReference
  pattern: "^semapv:(MappingReview|ManualMappingCuration|LogicalReasoning|LexicalMatching|CompositeMatching|UnspecifiedMatching|SemanticSimilarityThresholdMatching|LexicalSimilarityThresholdMatching|MappingChaining)$"
  required: true
  any_of:
    - equals_string: semapv:LexicalMatching
    - equals_string: semapv:LogicalReasoning
    - equals_string: semapv:CompositeMatching
    - equals_string: semapv:UnspecifiedMatching
    - equals_string: semapv:SemanticSimilarityThresholdMatching
    - equals_string: semapv:LexicalSimilarityThresholdMatching
    - equals_string: semapv:MappingChaining
    - equals_string: semapv:MappingReview
    - equals_string: semapv:ManualMappingCuration

There are several issues with this definition:

1) Why both a pattern constraint and a any_of constraint? My understanding is that they are redundant. Expressing the same constraint twice in two different forms creates the risk of the two forms becoming out-of-sync, if someone updates, say, the any_of list but forgets to similarly update the pattern expression (a risk made even slightly greater by the fact that the allowed values are not listed in the same order in both forms).

2) Both lists are already out-of-sync with the Semantic Mapping Vocabulary which, as of today, defines at least three more “matching processes”:

Ultimately, the definition should probably make use of LinkML’s dynamic enums, to avoid having to manually update the constraints in the SSSOM schema every time the semantic mapping vocabulary is enriched.

3) The equals_string constraints force the slot to have the range string. The LinkML specification is explicit:

the slot must have range string and the value of the slot must equal the specified value

But SSSOM defines mapping_justification as an EntityReference, which is ultimately a uriOrCurie, which in LinkML is a base type unrelated to string.

4) Independently of the typing issue above, both the pattern and the any_of constraints force the value to be in CURIE form, even though the underlying uriOrCurie type allows for either a CURIE or an URI.

matentzn commented 1 year ago

Thank you @gouttegd your analysis is spot on. Back when this proposed dynamic enums did not exist yet, and there was no great way to constrain a field like this. So we resorted to regex. I think originally i was recommended any_of but the validation framework back then did not process it, so I added the regex afterwards.

In any case you are 💯 correct that we should switch to dynamic enums.