Open gouttegd opened 1 year ago
Thank you @gouttegd your analysis is spot on. Back when this proposed dynamic enums did not exist yet, and there was no great way to constrain a field like this. So we resorted to regex. I think originally i was recommended any_of
but the validation framework back then did not process it, so I added the regex afterwards.
In any case you are 💯 correct that we should switch to dynamic enums.
The
mapping_justification
slot is defined in the LinkML model as follows:There are several issues with this definition:
1) Why both a
pattern
constraint and aany_of
constraint? My understanding is that they are redundant. Expressing the same constraint twice in two different forms creates the risk of the two forms becoming out-of-sync, if someone updates, say, theany_of
list but forgets to similarly update thepattern
expression (a risk made even slightly greater by the fact that the allowed values are not listed in the same order in both forms).2) Both lists are already out-of-sync with the Semantic Mapping Vocabulary which, as of today, defines at least three more “matching processes”:
https://w3id.org/semapv/vocab/BackgroundKnowledgeBasedMatching
https://w3id.org/semapv/vocab/InstanceBasedMatching
https://w3id.org/semapv/vocab/MappingInversion
Ultimately, the definition should probably make use of LinkML’s dynamic enums, to avoid having to manually update the constraints in the SSSOM schema every time the semantic mapping vocabulary is enriched.
3) The
equals_string
constraints force the slot to have the rangestring
. The LinkML specification is explicit:But SSSOM defines
mapping_justification
as anEntityReference
, which is ultimately auriOrCurie
, which in LinkML is a base type unrelated tostring
.4) Independently of the typing issue above, both the
pattern
and theany_of
constraints force the value to be in CURIE form, even though the underlyinguriOrCurie
type allows for either a CURIE or an URI.