Closed ValWood closed 3 years ago
I only have these two errors. to fix, which will bring to 1331:
needed 6 columns, got 1 - ignoring line 50 can't load annotation for SPCC1183.03c, line 1327 - MONDO:0009245 not found in database
the number of disease genes was 1346.
That doesn't match what I see when I look at older database versions.
I get:
I've attached a zip file with the disease gene lists from 6th, 7th and 9th. Those are the dates when the nightly load started so the main site would have those changes on the mornings of 7th, 8th and 10th. If you closed that issue ~7 june then you would have been looking at the data from the 6th. I've temporarily switched http://dev.pombase.kmr.nz/ to be the update from the 6th in case you need to dig deeper.
I'm a bit baffled.
This is the difference from the 7th to the current list: https://www.pombase.org/results/from/id/71ef7397-3093-47e1-8063-21b26c3ed684
The first 2 phs1, rps20 do not have a disease in OMIM, only 'variant of unknown significance' so I am not sure which disease they were mapped to, and I don't remember removing anything?
I'm a bit baffled.
Me too.
I am not sure which disease they were mapped to
You can look them up on the dev site which is currently on the version from 2021-06-07: http://dev.pombase.kmr.nz/
In case it helps here are the genes and disease associations, straight from Chado from 2021-06-07 and 2021-06-23. Let me know if you'd like a different format. The lines in both files are sorted so you should be able diff them.
gene_and_disease-2021-06-07.tsv.txt gene_and_disease-2021-06-23.tsv.txt
these have gone but I didn't see a log error: congenital fiber-type disproportion myopathy | act1, cdc8, myo2, myp2, phs1, rlc1
rps20 MONDO:0018604 | familial colorectal cancer type X | MalaCards | Rappaport N et al. (2017)
chz1, jmj2, lid2, msc1, rps401, rps402, rps403, sum3 MONDO:0010767 | spermatogenic failure, Y-linked, 2 | MalaCards | Rappaport N et al. (2017)
Diamond-Blackfan anemia | rpl1801, rpl1802, rpl35, rps2201, rps2202
cryptogenic multifocal ulcerous stenosing enteritis | plb1, SPAC1786.02, SPAC1A6.03c, SPAC977.09c, SPBC1348.10c, SPCC1450.09c
dentin dysplasia type I | vps4
acetyl-coa carboxylase deficiency | cut6
Mobius syndrome | rev3
gliosarcoma | alp7 (this one looks familiar, I think I fixed that one last night)
autosomal recessive spastic paraplegia type 60 | bun107
split hand or/and split foot malformation | ede1, end3, irs4, ucp8
Leigh syndrome with leukodystrophy aim22, coq11, etp1, fmt1, pda1, sdh1, shy1, tac1, SPAC11E3.12, SPBC18E5.10, SPCC417.16
spinocerebellar ataxia type 18 | SPBC20F10.03
myelodysplastic syndrome | git5, idp1, ras1, uaf2
Charcot-Marie-Tooth disease (MONDO:0015626) direct annotations: ala1, atp6, grs1, jnm1, sac3, sac32
familial Alzheimer disease | brc1, hob1, iph1, map1, mbx1, pef1, yap18
precursor T-cell acute lymphoblastic leukemia | ccp1, not3, nup146, sum3, yap18
velocardiofacial syndrome | bis1
autosomal dominant non-syndromic intellectual disability (MONDO:0015802)
large list, only SPAC1851.03 | ckb1 | CK2 family regulatory subunit Ckb1 SPBC2G5.02c | ckb2 | CK2 regulatory subunit beta isoform 2, Ckb2 probably need remapping
cytochrome-c oxidase deficiency disease (MONDO:0009068) cfh4, coa3, cox1, cox10, cox12, cox14, cox2, cox20, cox3, cox6, pet117, sco1, shy1, tac1 only pet117 is not remapping
fatal infantile hypertrophic cardiomyopathy due to mitochondrial complex I deficiency | SPAC9E9.15
thrombocytopenia 2 | ppk18 altthough this mapping looks incorrect because
An autosomal dominant disorder caused by mutation(s) in the ANKRD26 gene, encoding ANKRD26 protein. and this isn't the ortholog
I think I've tracked down the problem but I don't understand it.
I noticed that all the missing genes I looked at were towards the end of the malacards data file. After but of debugging, I found that if I remove this line (that has a non-ascii character), the whole file loads without a problem:
attenuated chédiak-higashi syndrome attenuated_chediak_higashi_syndrome Attenuated Chediak-Higashi Syndrome LYST
I don't know why this has happened when it's been working well for months. I must have changed something but I can't think what. I'll dig in tomorrow. I'll also add better error checking for the malacards loader.
In the meantime, I've removed that line as it doesn't have a MONDO ID. Hopefully tonight's load will be better.
These are the rest: SPBC1709.11c png2 ING family histone acetyltransferase complex PHD-type zinc finger subunit Png2 SPAC607.04 arg82 inositol polyphosphate kinase Arg82 (predicted) SPBP4H10.11c lcf2 long-chain-fatty-acid-CoA ligase SPCC794.12c mae2 malic enzyme, malate dehydrogenase (oxaloacetate decarboxylating), Mae2 SPAC24H6.01c gup1 membrane bound O-acyltransferase, MBOAT Gup1 (predicted) SPBC21D10.11c nfs1 mitochondrial [2Fe-2S] cluster assembly and tRNA modification cysteine desulfurase Nfs1 SPBC17A3.07 pgr1 mitochondrial glutathione reductase Pgr1 SPAC23H3.08c bub3 mitotic spindle checkpoint WD repeat protein Bub3 SPAC750.08c NAD-dependent malic enzyme (predicted), partial SPBC337.08c ubi4 protein modifier, ubiquitin SPNCRNA.82 mrp1 RNAse MRP SPNCRNA.214 ter1 telomerase RNA SPCC1919.14c bdp1 transcription factor TFIIIB complex subunit Bdp1 (predicted) SPBC25H2.07 tif11 translation initiation factor eIF1A SPAP8A3.12c tpp2 tripeptidyl-peptidase II Tpp2
One the trouble-shooting is done could you transfer this ticket to the curation tracker because I suspect a few of these really do need remapping.
Much better! There are now 1383 disease genes: https://www.pombase.org/results/from/id/58ea50d0-d07d-4933-89d8-fca9adf2f2cf
I've fixed the code so this won't happen again. It was caused by changes I made two weeks ago to get the loading working on my desktop after an upgrade.
Perfect! not far from 1400, I have a few more up my sleeve ;) (I have been working through a list from Alliance/cerevisiae, most are not causal but about 20% rate and I have about 40 left to check...)
All present and correct.....
after I closed this ticket (~7 june) https://github.com/pombase/curation/issues/3018
the number of disease genes was 1346. It has now dropped to 1329. Could I get the list of disease genes from 9/10/11 june so I can see what is missing?