Svtools' vcftobedpe tool produces one-based START coordinates in bedpe output, instead of zero-based coordinates that bedpe requires. I came across this (major) bug as bedtools' pairtopair would not work as expected even if same bedpe file as input to -a and -b. This is because bedpe is not in valid bedpe format, and this is a major problem especially for SV calls whose CIPOS/CIEND is 0,0.
##fileformat=VCFv4.1
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS">
##INFO=<ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT sample
chr1 839442 MantaDEL:123157:0:0:0:0:0 CACCTAGACACACACTCCTGGACACACACACCTAGACACACACACCTGGACAAACACA CT 139 PASS END=839499;SVTYPE=DEL;SVLEN=-57;CIGAR=1M1I57D GT:FT:GQ:PL:PR:SR 0/1:PASS:86:189,0,83:1,0:8,5
chr1 1477835 MantaDEL:123205:0:1:0:1:0 AGCTGGGATTACAGGCACGCGCCACCACGCCTGGCTAATGTTGTATTTTAGTAGAGACGGGGTTTCTCCATGTTGGTCAGGCTGGTCTCTAACTCCCGACCTCAGGTGATCCACCCGCCTCGGCCTCTCAAACT A 27 PASS END=1477968;SVTYPE=DEL;SVLEN=-133;CIGAR=1M133D;CIPOS=0,34;HOMLEN=34;HOMSEQ=GCTGGGATTACAGGCACGCGCCACCACGCCTGGC GT:FT:GQ:PL:PR:SR 0/1:PASS:27:77,0,324:18,0:32,3
chr1 1565630 MantaDUP:TANDEM:123223:0:1:0:0:0 G <DUP:TANDEM> 94 PASS END=1565727;SVTYPE=DUP;SVLEN=97;CIPOS=0,1;CIEND=0,1;HOMLEN=1;HOMSEQ=T GT:FT:GQ:PL:PR:SR 0/1:PASS:94:144,0,772:27,0:53,9
chr1 1595101 MantaDEL:123214:0:0:0:1:0 TGACAGAGAGAGGCAGAGAGAGAGAGAGAGAGACAGACACAGAGAGAGCAGAACAGGGAGAAACAGAGAGACAGAGAGCGAGA T 238 MaxDepth END=1595183;SVTYPE=DEL;SVLEN=-82;CIGAR=1M82D;CIPOS=0,2;HOMLEN=2;HOMSEQ=GA GT:FT:GQ:PL:PR:SR 0/1:PASS:238:288,0,341:10,0:41,10
Output bedpe file
##fileformat=BEDPE
##INFO=<ID=SVTYPE,Number=1,Type=String,Description="Type of structural variant">
##INFO=<ID=POS,Number=1,Type=Integer,Description="Position of the variant described in this record">
##INFO=<ID=SVLEN,Number=.,Type=Integer,Description="Difference in length between REF and ALT alleles">
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record">
##INFO=<ID=CIPOS,Number=2,Type=Integer,Description="Confidence interval around POS">
##INFO=<ID=CIEND,Number=2,Type=Integer,Description="Confidence interval around END">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
#CHROM_A START_A END_A CHROM_B START_B END_B ID QUAL STRAND_A STRAND_B TYPE FILTER NAME_A REF_A ALT_A NAME_B REF_B ALT_B INFO_A INFO_B FORMAT sample
chr1 839442 839442 chr1 839499 839499 MantaDEL:123157:0:0:0:0:0 139.0 + - DEL PASS MantaDEL:123157:0:0:0:0:0 CACCTAGACACACACTCCTGGACACACACACCTAGACACACACACCTGGACAAACACA CT . . . SVTYPE=DEL;POS=839442;SVLEN=-57;END=839499 . GT:FT:GQ:PL:PR:SR 0/1:PASS:86:189,0,83:1,0:8,5
chr1 1477835 1477869 chr1 1477968 1477968 MantaDEL:123205:0:1:0:1:0 27.0 + - DEL PASS MantaDEL:123205:0:1:0:1:0 AGCTGGGATTACAGGCACGCGCCACCACGCCTGGCTAATGTTGTATTTTAGTAGAGACGGGGTTTCTCCATGTTGGTCAGGCTGGTCTCTAACTCCCGACCTCAGGTGATCCACCCGCCTCGGCCTCTCAAACT A . . . SVTYPE=DEL;POS=1477835;SVLEN=-133;END=1477968;CIPOS=0,34 . GT:FT:GQ:PL:PR:SR 0/1:PASS:27:77,0,324:18,0:32,3
chr1 1565630 1565631 chr1 1565727 1565728 MantaDUP:TANDEM:123223:0:1:0:0:0 94.0 + - DUP PASS MantaDUP:TANDEM:123223:0:1:0:0:0 G <DUP:TANDEM> . . . SVTYPE=DUP;POS=1565630;SVLEN=97;END=1565727;CIPOS=0,1;CIEND=0,1 . GT:FT:GQ:PL:PR:SR 0/1:PASS:94:144,0,772:27,0:53,9
chr1 1595101 1595103 chr1 1595183 1595183 MantaDEL:123214:0:0:0:1:0 238.0 + - DEL MaxDepth MantaDEL:123214:0:0:0:1:0 TGACAGAGAGAGGCAGAGAGAGAGAGAGAGAGACAGACACAGAGAGAGCAGAACAGGGAGAAACAGAGAGACAGAGAGCGAGA T . . . SVTYPE=DEL;POS=1595101;SVLEN=-82;END=1595183;CIPOS=0,2 . GT:FT:GQ:PL:PR:SR 0/1:PASS:238:288,0,341:10,0:41,10
Svtools'
vcftobedpe
tool produces one-based START coordinates in bedpe output, instead of zero-based coordinates that bedpe requires. I came across this (major) bug as bedtools' pairtopair would not work as expected even if same bedpe file as input to-a
and-b
. This is because bedpe is not in valid bedpe format, and this is a major problem especially for SV calls whoseCIPOS
/CIEND
is0,0
.From quick look, concerning code is this: https://github.com/hall-lab/svtools/blob/6a6a7b059df196ec49de6cd0b8ff816942f9055a/svtools/vcftobedpeconverter.py#L80-L82
And here are the test files and command I used. Note that bedpe's START coordinates are one-based.
Command used:
svtools vcftobedpe -i test.vcf -o out.bedpe
Test vcf file: