hall-lab / svtools

Tools for processing and analyzing structural variants.
MIT License
149 stars 54 forks source link

vcftobedpe's output has one-based START coordinates instead of one-based #301

Open ManavalanG opened 4 years ago

ManavalanG commented 4 years ago

Svtools' vcftobedpe tool produces one-based START coordinates in bedpe output, instead of zero-based coordinates that bedpe requires. I came across this (major) bug as bedtools' pairtopair would not work as expected even if same bedpe file as input to -a and -b. This is because bedpe is not in valid bedpe format, and this is a major problem especially for SV calls whose CIPOS/CIEND is 0,0.

From quick look, concerning code is this: https://github.com/hall-lab/svtools/blob/6a6a7b059df196ec49de6cd0b8ff816942f9055a/svtools/vcftobedpeconverter.py#L80-L82

And here are the test files and command I used. Note that bedpe's START coordinates are one-based.

##fileformat=VCFv4.1
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS">
##INFO=<ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END">
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  sample
chr1    839442  MantaDEL:123157:0:0:0:0:0   CACCTAGACACACACTCCTGGACACACACACCTAGACACACACACCTGGACAAACACA  CT  139 PASS    END=839499;SVTYPE=DEL;SVLEN=-57;CIGAR=1M1I57D   GT:FT:GQ:PL:PR:SR   0/1:PASS:86:189,0,83:1,0:8,5
chr1    1477835 MantaDEL:123205:0:1:0:1:0   AGCTGGGATTACAGGCACGCGCCACCACGCCTGGCTAATGTTGTATTTTAGTAGAGACGGGGTTTCTCCATGTTGGTCAGGCTGGTCTCTAACTCCCGACCTCAGGTGATCCACCCGCCTCGGCCTCTCAAACT  A   27  PASS    END=1477968;SVTYPE=DEL;SVLEN=-133;CIGAR=1M133D;CIPOS=0,34;HOMLEN=34;HOMSEQ=GCTGGGATTACAGGCACGCGCCACCACGCCTGGC   GT:FT:GQ:PL:PR:SR   0/1:PASS:27:77,0,324:18,0:32,3
chr1    1565630 MantaDUP:TANDEM:123223:0:1:0:0:0    G   <DUP:TANDEM>    94  PASS    END=1565727;SVTYPE=DUP;SVLEN=97;CIPOS=0,1;CIEND=0,1;HOMLEN=1;HOMSEQ=T   GT:FT:GQ:PL:PR:SR   0/1:PASS:94:144,0,772:27,0:53,9
chr1    1595101 MantaDEL:123214:0:0:0:1:0   TGACAGAGAGAGGCAGAGAGAGAGAGAGAGAGACAGACACAGAGAGAGCAGAACAGGGAGAAACAGAGAGACAGAGAGCGAGA T   238 MaxDepth    END=1595183;SVTYPE=DEL;SVLEN=-82;CIGAR=1M82D;CIPOS=0,2;HOMLEN=2;HOMSEQ=GA   GT:FT:GQ:PL:PR:SR   0/1:PASS:238:288,0,341:10,0:41,10
ManavalanG commented 4 years ago

Found late that this was reported before in #275. Also, #46 appears to be related to this.