monarch-initiative / phenol

phenol: Phenotype ontology library
https://phenol.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
23 stars 4 forks source link

Modify OrphaGeneToDiseaseParser to avoid associating genes with grouping diseases #445

Open cmungall opened 12 months ago

cmungall commented 12 months ago

From @pnrobinson on email

In the Exomiser, HPO website, Monarch Web app, we find the same medical error.

Orphanet associated some diseases with tons of genes

ORPHA:91387Familial thoracic aortic aneurysm and aortic dissection

MFAP5 [8076 ] FOXE3 [2301 ] LOX [4015 ] MYH11 [4629 ] PRKG1 [5592 ] TGFB2 [7042 ] MAT2A [4144 ] TGFB3 [7043 ] ACTA2 [59 ] SMAD3 [4088 ] TGFBR2 [7048 ] TGFBR1 [7046 ] HEY2 [23493 ] SMAD4 [4089 ] SMAD2 [4087 ] THSD4 [79875 ] ELN [2006 ] MYLK [4638 ] FBN1 [2200 ]

@cmungall's comments:

Yes, monarch-kg gets these association from phenol via http://purl.obolibrary.org/obo/hp/hpoa/genes_to_disease.txt

I suggest we modify this code as per Peter's suggestion:

https://github.com/monarch-initiative/phenol/blob/master/phenol-annotations/src/main/java/org/monarchinitiative/phenol/annotations/assoc/OrphaGeneToDiseaseParser.java

We would skip g2d if there are multiple causal genes for that disease.

We could explore other approaches - e.g. using the Mondo mappings which tell us when an ordo class is subsuming OMIMs, but the g2d cardinality approach is simple and standalone and will leave us in a much better place than we are now

pnrobinson commented 2 months ago

@ielis @iimpulse I will take this on if nobody is working on this?