For preparation for Cosas v1, there are few fixes and changes needed to the cosasrefs_diagnoses` table.
[x] Change the variable name and labels to match with cosasrefs_phenotype
[x] Add attribute dateLastUpdated
[x] Integrate Cineas to HPO mappings
[x] Create section that processes the HPO phenotype reference dataset (see #5)
[x] Move the build cosasrefs datasets into build_emx
[x] Verify that labels display properly
Methodology
Cineas to HPO Mappings
The SORTA tool can be used to map Cineas codes into HPO codes. The ontology can be downloaded from here: https://www.ebi.ac.uk/ols/ontologies/hp. Zip the file and upload into your molgenis database using the "Advanced Data Import" plugin. Molgenis will handle the rest.
Transform the Cineas codes and rename the column to Name. Create a new task and run. The results will be saved in the default location.
Name
adrenal insufficiency
chronic lymphocytic leukemia
...
For the COSAS, we are interested in terms that have a similarity score of 70% or higher. Everything else isn't really necessary (at this point anyways). We also want pure HPO matches (some matches returned GO and other ontologies). Codes will also need to be extracted from the IRI.
Proposed fixes and changes
For preparation for Cosas v1, there are few fixes and changes needed to the cosasrefs_diagnoses` table.
cosasrefs_phenotype
dateLastUpdated
build_emx
Methodology
Cineas to HPO Mappings
The SORTA tool can be used to map Cineas codes into HPO codes. The ontology can be downloaded from here: https://www.ebi.ac.uk/ols/ontologies/hp. Zip the file and upload into your molgenis database using the "Advanced Data Import" plugin. Molgenis will handle the rest.
Transform the Cineas codes and rename the column to
Name
. Create a new task and run. The results will be saved in the default location.For the COSAS, we are interested in terms that have a similarity score of 70% or higher. Everything else isn't really necessary (at this point anyways). We also want pure HPO matches (some matches returned GO and other ontologies). Codes will also need to be extracted from the IRI.
Since all the mappings are bonus, remove any HPO terms that do not exist in the phenotypes reference table.