monarch-initiative / phenopacket2prompt

GA4GH Phenopacket to LLM prompt
https://monarch-initiative.github.io/phenopacket2prompt/
MIT License
2 stars 1 forks source link

Missing translations of HPO terms omitted in prompts #22

Closed leokim-l closed 1 week ago

leokim-l commented 3 months ago

Make sure missing translations of HPO terms are not omitted in non-English prompts. As of now, for instance, a phenopacket with "Short Eyelashes", for which we don´t have an Italian translation, will generate an Italian prompt with a list of observed phenotypes where "Short Eyelashes" is omitted. See

PMID_37179472_Family_3_II_7_en-prompt.txt PMID_37179472_Family_3_II_7_it-prompt.txt

Maybe have it default to the English term (seems safer) and then check via some test that it actually never happens?

pnrobinson commented 3 months ago

I think that we probably need to have the program crash if something is missing. We can either not use genes/phenopackets with missing data or we need to update the translations. I do not think we can use the en word because this could change the performance of the GPT in hard to predict ways!

leokim-l commented 1 week ago

Fixed by #54 related to #51, partially reopened as a check in #62