molgenis / projects-rd-cosas

0 stars 0 forks source link

Fix: changes to the diagnoses table #3

Closed davidruvolo51 closed 3 years ago

davidruvolo51 commented 3 years ago

Proposed fixes and changes

For preparation for Cosas v1, there are few fixes and changes needed to the cosasrefs_diagnoses` table.

Methodology

Cineas to HPO Mappings

The SORTA tool can be used to map Cineas codes into HPO codes. The ontology can be downloaded from here: https://www.ebi.ac.uk/ols/ontologies/hp. Zip the file and upload into your molgenis database using the "Advanced Data Import" plugin. Molgenis will handle the rest.

Transform the Cineas codes and rename the column to Name. Create a new task and run. The results will be saved in the default location.

Name
adrenal insufficiency
chronic lymphocytic leukemia
...

For the COSAS, we are interested in terms that have a similarity score of 70% or higher. Everything else isn't really necessary (at this point anyways). We also want pure HPO matches (some matches returned GO and other ontologies). Codes will also need to be extracted from the IRI.

sorta[
    score >= 70 & ontologyTermIRI %like% "purl.obolibrary.org/obo/HP",
][,
    code := gsub("http://purl.obolibrary.org/obo/HP_", "", score)
]

Since all the mappings are bonus, remove any HPO terms that do not exist in the phenotypes reference table.

davidruvolo51 commented 3 years ago

Samples db: Finalize EMX, process RFQ results