monarch-initiative / mondo-curation-analysis

0 stars 0 forks source link

test improved lexical matching methods #1

Open twhetzel opened 2 months ago

twhetzel commented 2 months ago

lexmatch from OAK does not match all concepts between mondo and another resource. in some cases this is do to british vs. english spelling, or type 1 vs. type I, or that the tokens in the term names are in different order. this ticket is to explore additional lexical matching strategies to identify these matches.

matentzn commented 2 months ago

A bit unrelated: I have implemented a british english -> american english syncing pipeline for Mondo and HPO which uses a dictionary approach

https://github.com/monarch-initiative/mondo/blob/6d30c5acb9b68cb07bbe34ef0ac9d374acf9b2fd/src/ontology/mondo.Makefile#L873

I would be interested if there is a more modern, batter way to do this.