Closed raymond301 closed 6 years ago
If you are looking for all the reported significances, you can look at the columns: pathogenic, likely pathogenic, uncertain_significance, likely_benign and benign. They record the number of submissions for clinical significances correspondingly.
If you are looking for a date-ordered list of clinical significances, the current pipeline did not have this function.
https://github.com/macarthur-lab/clinvar/pull/33 was meant to address this. I thought you incorporated my fixes @XiaoleiZ ?
See e.g. https://github.com/macarthur-lab/clinvar/pull/33/files#diff-7e4b0936672060588ac6388eac4f2992
Thanks for pointing out @kristjaneerik. I found out I did not include this part. But the order would not be kept in this way. We should add the time info in the parsing XML step.
If you look at the rest of the diff I did that too, e.g. https://github.com/macarthur-lab/clinvar/pull/33/files#diff-850079ba25065febf15fcf8c34207f57L135
Is this being worked on? Putting ordered fields back into the result set?
I unfortunately don't have the time to pick this up again, but the code is all there in my PR #33. It was basically good to go, I just didn't have a time to do a thorough comparison of the results to make sure no bugs were introduced.
PR #33 has been merged so I'm closing this issue.
@bw2 yep, #33 was merged, but it looks like @XiaoleiZ reverted the changes that fixed this bug in #41 if you look at e.g. group_by_allele.py
in https://github.com/macarthur-lab/clinvar/commit/03aa390d4c79176ca6ed36b65eea638edee5eb05#diff-7e4b0936672060588ac6388eac4f2992L75
The clinical sig is lost per submitter, in the group_by_allele.py.
For example: https://www.ncbi.nlm.nih.gov/clinvar/variation/92428/ Has 4 submitters, 1 calls Likely Benign 3 calls Benign.
clinvar_allele_trait_pairs.single.tsv.gz | grep 39073 | grep 53676401 | less Has 3 lines, with the correct Clin Sig Order.
But clinvar_alleles_grouped.single.tsv.gz | grep 39073 | grep 53676401 | less Reduces it to "benign;likely benign" indicating 2 entries, for 4 submitters. The order is lost.
BUT.... submitters_ordered is still correct: EGL Genetic Diagnostics,Eurofins Clinical Diagnostics;GeneReviews;Illumina Clinical Services Laboratory,Illumina;Center for Pediatric Genomic Medicine,Children's Mercy Hospital and Clinics