hall-lab / svtyper

Bayesian genotyper for structural variants
MIT License
125 stars 55 forks source link

header parsing can't handle ">" in description #105

Open brentp opened 5 years ago

brentp commented 5 years ago

e.g. this header from the genome in a bottle truth set fails:

##INFO=<ID=DistPASSHG2gt49Minlt1000,Number=1,Type=String,Description="TRUE if Distance to the closest non-matching PASS variant >49bp in HG002 in either direction is less than 1000bp, suggesting possible complex or compound heterozygous variant or inaccurate call">
brentp commented 5 years ago

with this header as input, svtyper truncates the description and puts e.g. Description="" (note the double quote")

##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of SV:DEL=Deletion, CON=Contraction, INS=Insertion, DUP=Duplication, INV=Inversion">
brentp commented 5 years ago

It seems to also fail if there are = in the readme, e.g.:

##INFO=<ID=REPTYPE,Number=1,Type=String,Description="Type of SV, with designation of uniqueness of new or deleted sequence:SIMPLEDEL=Deletion of at least some unique sequence, SIMPLEINS=Insertion of at least some unique sequence, CONTRAC=Contraction, or deletion of sequence entirely similar to remaining sequence, DUP=Duplication, or insertion of sequence entirely similar to pre-existing sequence, INV=Inversion, SUBSINS=Insertion of new sequence with alteration of some pre-existing sequence, SUBSDEL=Deletion of sequence with alteration of some remaining sequence">
brentp commented 5 years ago

the vcf file that's the source of all of the is available here: ftp://ftp-trace.ncbi.nlm.nih.gov/giab/ftp/data/AshkenazimTrio/analysis/NIST_SVs_Integration_v0.6/