Closed tskir closed 2 years ago
103580
MIM 615553
MIM# 615249
(tab character used as a separator)OMIM 610549
OMIM #612073
OMIM:612186
OMIM: 617236
OMIM #617767
OMIM# 618658
ORPHA90636
Orphanet:453499
OrphaNet: ORPHA404454
#107480:Townes-Brocks syndrome
#601869: Deafness, autosomal recessive 15
Cenani-Lenz syndactyly syndrome 212780
Cenani-Lenz syndactyly syndrome, 212780
Centronuclear myopathy 5 ( 615959)
Cerebellar ataxia, areflexia, pes cavus, optic atrophy and sensorineural hearing loss (CAPOS, #601338)
{Amyotrophic lateral sclerosis, susceptibility to, 13}, 183190
"Usher syndrome, type 1D, 601067;Deafness, autosomal recessive; 12, 601386;Usher syndrome, type 1D/F digenic, 601067"
— Disease name is using a semicolon, which breaks phenotype splittingCombined oxidative phosphorylation deficiency 6, 300816Cowchock syndrome, 310490
— Two diseases are glued to each other17,20-lyase deficiency, isolated 202110 and 17-alpha-hydroxylase/17,20-lyase deficiency 202110
— Two diseases in one, separated by “and”Meier-Gorlin syndrome with craniosynostosis (from PMID 27374770)
Jervell and Lange-Nielsen syndrome [Congenital sensorineural hearing loss, Prolonged QT interval on EKG, SyncopeTorsades de pointes, Sudden cardiac death, Caused by mutation in the potassium voltage-gated channel, KQT-like subfamily, member 1 gene (KCNQ1), Caused by mutation in the potassium voltage-gated channel, Isk-related subfamily, member 1 gene (KCNE1)]
22q11.2 deletion syndrome, Orphanet:567 (includes developmental delay)
?
with unclear meaning:
616553 ?Dyskeratosis congenita 6 and 7
3- ?-hydroxysterol ?5-oxidoreductase/isomerase deficiency (Disorders of bile acid biosynthesis)
Epileptic encephalopathy, intellectual disability, no OMIM# yet
;
with and without leading/trailing spaces)98(6):1193-2The list of everything I could find 07. doi: 10.1016/j.ajhg.2016.05.004. PubMed PMID: 27259053, PubMed Central PMCID: PMC4908191.
doi:10.1007/s12265-016-9673-5
DOI: https://doi.org/10.1016/j.xhgg.2021.100033
https://doi.org/10.1101/797787
https://doi-org.ezproxy.library.qmul.ac.uk/10.1093/brain/awaa085
PMID: 26933893
PMID: 27078007 (full text not available to confirm findings).
Aldahmesh (2012) Genet Med 14(12):955-962, PMID: 22935719
12702164
15985586 (two siblings)
16060907 (Camilot et al., 2005 report subclinical hypothyroid subjects with heterozygous substitutions
25674101 - review from the same authors as PMID:23972370
2194867118000911 — but displayed as two separate ones: https://panelapp.genomicsengland.co.uk/panels/81/gene/FAM20C/#!details
Spreadsheet with data and the metrics is available here: https://docs.google.com/spreadsheets/d/1VBykrN6iyEqBuGJgOJYNOYefINe3dmJ2NjEKCuXmHcE/edit#gid=654888619. The following report contains some exerpts & analysis.
The files being compared are:
All_genes_20200928-1959.tsv
(2020-09-28, MD5 024bbad3685a0a9797e63314e6e7c77a
)All_genes_20220804-1350_green_amber_red_public_v1plus_no_superpanels.xlsx
(2022-08-04, MD5 6a52bece16f49891f8b9aa7135d0e476
).Compared to the old file, the new file has significantly fewer:
The complete list of the 29 panels which are missing from the new file: Panel ID | Panel name | Number of genes |
---|---|---|
8 | Refuted genes | 5 |
14 | Multiple bowel polyps | 14 |
28 | Congenital neutropaenia | 17 |
32 | Kyphoscoliotic Ehlers-Danlos syndrome | 3 |
58 | Ehlers-Danlos syndrome type 3 | 55 |
64 | ClinGen Gene Validity Curations | 47 |
67 | Epileptic encephalopathy | 183 |
121 | A- or hypo-gammaglobulinaemia | 28 |
124 | Combined B and T cell defect | 24 |
135 | Dilated Cardiomyopathy (DCM) | 75 |
137 | Familial colon cancer | 26 |
160 | Genetic Epilepsies with Febrile Seizures Plus (GEFS+) | 6 |
161 | Epilepsy Plus | 142 |
170 | SCID | 25 |
203 | Agranulocytosis | 2 |
204 | Bilateral microtia | 46 |
210 | ClinGen_Familial thoracic aortic aneurysm and aortic dissection | 53 |
240 | Familial Genetic Generalised Epilepsies | 25 |
252 | Familial Focal Epilepsies | 10 |
268 | Meiges disease | 14 |
289 | Multiple Tumours | 129 |
297 | Bardet-Biedl Syndrome | 22 |
399 | Additional findings health related | 14 |
412 | Gene therapy clinical trials | 21 |
657 | Autism | 735 |
720 | Groopman et al 2019 - Genes with diagnostic variants | 66 |
745 | CHARGE syndrome | 1 |
880 | Nephrolithiasis and Nephrocalcinosis_KidGen_VCGS | 30 |
928 | Viral resistance | 24 |
Finally, our parser successfully runs on the new file and generates the evidence, which valid against the schema. However, in light of several important-looking panels missing, I do not recommend migration to it right away.
Followed up with Eleanor regarding data normalisation
Re-ran the parser on the new data (All_genes_20220804-1350_green_amber_red_public_v1plus_no_superpanels.tsv
) with debug tables, made sure that the preprocessing regular expressions are still holding up. As far as I can see, everything still gets processed correctly. The debug tables can be found here, sheets "Phenotypes" and "PMIDs".
There are no more actions on our side. Closing this issue.
Follow-up issue from https://github.com/opentargets/platform/issues/1636.
Final set of suggestions will need to be compiled after EFO mapping is implemented as well.