monarch-initiative / phenol

phenol: Phenotype ontology library
https://phenol.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
23 stars 4 forks source link

Orphanet Inheritance annotations #208

Closed pnrobinson closed 5 years ago

pnrobinson commented 5 years ago

cf: http://www.orphadata.org/data/xml/en_product9_ages.xml

pnrobinson commented 5 years ago

This is working now. Need to extend the tests to have more MoIs. Also, we cannot current map semidominant inheritance

pnrobinson commented 5 years ago

see https://www.ncbi.nlm.nih.gov/pubmed/23604102

julesjacobsen commented 5 years ago

Do you have a commit for this? I was looking at this file as part of the Exomiser ingest and it threw exceptions and all sorts when trying to parse using Jackson due to empty fields not being terminated in the expected manner.

pnrobinson commented 5 years ago

Funny you should ask: https://github.com/monarch-initiative/phenol/pull/214 I wrote a hand-crafted SAX parser for this file which seems to work. The are q few quirks that require some filtering... @iimpulse I will add the inheritance fields to phenotype.hpoa by extending hpoannotqc (the app that makes this file), so that the HPO website will display the Orphanet inheritance annotations after we update phenotype.hpoa the next time (I will need another few days to update that code, so we could do this at the next HPO web release).

julesjacobsen commented 5 years ago

Ahh - I ended up coming to the same conclusion and started with a SAX parser solution too. Never managed to finish though, so this could be a handy thing to use and save code duplication. Ah no! Even better! If you're adding this to the phenotype.hpoa will this end-up in the phenotype_annotation.tab file from here: https://hpo.jax.org/app/download/annotation? If so I'll just wait and the data will magically appear which will make me very happy.

pnrobinson commented 5 years ago

Expect to see the orphanet inheritance data in phenotype_annotation.tab within the next 1-2 weeks -- I need to update hpoannotqc to use this and make sure that we only export annotations where we have phenotype data.

julesjacobsen commented 5 years ago

Fantastic! I owe you that beer you owe me.

julesjacobsen commented 5 years ago

Actually Peter, do you have any information on how we can address the Orphanet inheritance annotations for specific disease-gene associations?

pnrobinson commented 5 years ago

I have refactored and simplified the interface. A first text now shows

robinp@ldg-jgm004:~/IdeaProjects/hpoannotqc$ grep ^ORPHA phenotype.hpoa | grep -c 'HP:0000006'
989 # autosomal dominant
$ grep ^ORPHA phenotype.hpoa | grep -c 'HP:0000007'
1265 # autosomal recessive
$ grep ^ORPHA phenotype.hpoa | grep -c 'HP:0001423'
62 # X dominant
$ grep ^ORPHA phenotype.hpoa | grep -c 'HP:0001419'
240 # X recessive
$ grep -c ^ORPHA phenotype.hpoa 
54568
$ grep -c ^DECIPHER phenotype.hpoa 
297
$ grep -c ^OMIM phenotype.hpoa 
103616

The analysis was done via hpoannotqc. But the code would be something like this

PhenotypeDotHpoaFileWriter writer = PhenotypeDotHpoaFileWriter.factory(Ontology ont,
                                                   String smallFileDirectoryPath,
                                                   String orphaPhenotypeXMLpath,
                                                   String orphaInheritanceXMLpath,
                                                   String outpath)  ....
writer.write();

@julesjacobsen @iimpulse after the new release of phenol I will upload an updated version of phenotype.hpoa to the Jenkins server that will have the Orphanet inheritance annots.

pnrobinson commented 5 years ago

This is now integrated and working