iobio / gene.iobio

Gene.iobio vue
MIT License
57 stars 11 forks source link

Use eutils for clinvar annotations until new clinvar vcf can be correctly parsed #181

Closed tonydisera closed 3 years ago

tonydisera commented 6 years ago

Differences

CLNSIG=5; vs CLNSIG=Pathogenic; (code vs text) CLNACC=RCV000132731.1 vs missing (missing) CLNDBN=Smith-Magenis_syndrome; vs CLNDN=Smith-Magenis_syndrome; (different tag)

Latest clinvar

ALLELEID=152915; CLNDISDB=MedGen:C0795864,OMIM:182290,Orphanet:ORPHA819,SNOMED_CT:401315004; CLNDN=Smith-Magenis_syndrome; CLNHGVS=NC_000017.10:g.17698535G>A; CLNREVSTAT=no_assertion_criteria_provided; CLNSIG=Pathogenic; CLNVC=single_nucleotide_variant; CLNVCSO=SO:0001483; GENEINFO=RAI1:10743; MC=SO:0001587|nonsense; ORIGIN=32; RS=527236033

Working (on production) clinvar

RS=527236033; RSPOS=17698535; dbSNPBuildID=141; SSR=0; SAO=3; VP=0x050060000605000002110100; GENEINFO=RAI1:10743; WGT=1; VC=SNV; PM; NSN; REF; ASP; LSD; OM; CLNALLE=1; CLNHGVS=NC_000017.10:g.17698535G>A; CLNSRC=.; CLNORIGIN=32; CLNSRCID=.; CLNSIG=5; CLNDSDB=MedGen:OMIM:Orphanet:SNOMED_CT; CLNDSDBID=C0795864:182290:ORPHA819:401315004; CLNDBN=Smith-Magenis_syndrome; CLNREVSTAT=no_criteria; CLNACC=RCV000132731.1

tonydisera commented 6 years ago

ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/archive_2.0/2017/README_VCF.txt

tonydisera commented 6 years ago

As a first step, the app now uses the most up-to-date annotations by using eutils rather than the weekly clinvar vcf. This doesn't seem to impact performance negatively, so we will go this route for now. Here are the elapsed times running ACMG genes:

clinvar acme genes

eutils


wes 48 seconds 50 seconds 49 seconds wgs 69 seconds 59 seconds

vcf - acmg genes

wes 54 seconds 57 seconds 54 seconds wes 68 seconds 64 seconds