linkml / prefixmaps

Semantic prefix map registry
https://linkml.io/prefixmaps/
Apache License 2.0
10 stars 3 forks source link

Potential bug in the merge algorithm #50

Open cthoyt opened 8 months ago

cthoyt commented 8 months ago

In #48, I incorporated prefix synonyms from the Bioregistry. Since it links all of the many Wikidata CURIE prefix variants together (wd, wikidata, WD_Entity), it's surprising that there are still disconnected prefix expansions:

https://github.com/linkml/prefixmaps/blob/b8a2bbdf56cca5dec9d12e7cc4a9cf408736abeb/src/prefixmaps/data/merged.csv#L4541

https://github.com/linkml/prefixmaps/blob/b8a2bbdf56cca5dec9d12e7cc4a9cf408736abeb/src/prefixmaps/data/merged.csv#L4574

I think this is a problem because of how the merging algorithm works. The issue might be that merge algorithm doesn't have a way to stitch two previously disjoint canonical CURIE/URI prefix records together when it's given synonyms (or it is doing it, but not getting the optimal results).

Maybe an alternative is to just fix #49 directly