spiros / hdr-caliber-phenotype-library

HDR UK National Phenotype Library
https://portal.caliberresearch.org/
Creative Commons Attribution 4.0 International
20 stars 10 forks source link

Issues found with source data when scraping & importing to Phenotype Library #35

Closed ieuans closed 2 years ago

ieuans commented 3 years ago

Hi, whilst scraping phenotypes for the Phenotype Library import, the following issues were found:

  1. Phenotype Paige_CardiovascularDisease_ZAe8LVnjUBE6urqrGVJeRw.md references site.data.codelists.Paige_CardiovascularDisease_ZAe8LVnjUBE6urqrGVJeRw_Read2.csv but the file is missing

  2. Parisi_CardiovascularDisease_chY4tvNZ8UrR4qDXrZaKuC.md lists ICD codes under Primary Care tab, is this correct?

  3. Axson_pneumonia_GL46c8PLQMdPuj7QuPtEh9.md UK Biobank tab references site.data.codelists.axson_pneumonia_GL46c8PLQMdPuj7QuPtEh9_SNOMEDCT.csv instead of site.data.codelists.axson_pneumonia_GL46c8PLQMdPuj7QuPtEh9_UKBIOBANK.csv

  4. Ethnic-status.md stores codelist in blocktext instead of a table, I can add an edge case to provide this data but not sure on table structure desired?

  5. Gender specific diseases contain metadata 'sex' tag referencing both or incorrect sex, please see: https://github.com/SwanseaUniversityMedical/concept-library/issues/368