monarch-initiative / mondo-ingest

Coordinating the mondo-ingest with external sources
https://monarch-initiative.github.io/mondo-ingest/
5 stars 3 forks source link

Mapping: notes on mappings from 230330 #262

Open sabrinatoro opened 1 year ago

sabrinatoro commented 1 year ago

First of all, the lexical matches are great!!! Thank you for all your work.

Here are the small things I have found. I think these are mostly related to the data itself, and therefore it is mostly for curators to be aware of when reviewing.

unmapped_doid_lex_exact.tsv The exact lexical matches are okay (I haven’t seen any issues). Some capitalization needs to be reviewed, but this is very minimal (I found only 1 term that needed capitalization... though I cannot tell how many had the correct capitalization)

unmapped_doid_lex.tsv

In summary: I think that all the issues I have found are due to errors in the data itself. Therefore, curators should review the unmapped_..._lex.tsv files carefully and check for multiple source terms mapping to the same mondo, and vice versa.

Question/request: How should we keep track of the mappings to be ignored based on incorrect data? (ie the mappings are "correct" based on the mapping rules, but "incorrect" because the data is incorrect)

sabrinatoro commented 1 year ago

@matentzn @hrshdhgd @joeflack4 @nicolevasilevsky

joeflack4 commented 1 year ago

Maybe something like a new file config/mapping_exclusions.tsv? I suppose it would have the Mondo CURIE in one column, and native CURIE in another column, and then maybe an exclusion reason column, and I don't know if it makes sense to include the code for that like MONDO:badData within the normal exclusions enum, or make a new enum just for mappings.

@matentzn It seems like the focus here is on lexmatch, but if for some reason you think my scripts will need any updates, I'll stay posted.

matentzn commented 1 year ago

It is not necessarily just lexmatch, I answered here: https://github.com/monarch-initiative/mondo-ingest/issues/261

We should most definitely keep a record of false mappings.

matentzn commented 1 year ago

@sabrinatoro we need a paragraph on your "handles syn. differently in DO" observation for the paper. Could you write 3 sentences here that explains exactly what is going on, with an example?