mkirsche / Jasmine

Jasmine: SV Merging Across Samples
MIT License
174 stars 16 forks source link

Question regarding overlap when mergind VCF of multiple samples #60

Open LucaBertoli opened 4 months ago

LucaBertoli commented 4 months ago

Hi, I am trying to merge with jasmine three VCFs of three different samples, creating a multisample vcf. I have created two synthetic vcfs containing the following variants:

VCF1 (without header):

#CHROM  POS  ID  REF  ALT  QUAL  FILTER  INFO  FORMAT  1770S
chr1  100  chr1:100-200  A  <DUP>  .  PASS  SVTYPE=<DUP>;END=200;NEXONS=2;BF=4.89;SVLEN=100;RATIO=2.85  GT  0/1
chr1  1000  chr1:1000-1100  A  <DEL>  .  PASS  SVTYPE=<DEL>;END=1100;NEXONS=2;BF=10.4;SVLEN=100;RATIO=0.14  GT  0/1
chr1  10000  chr1:[10000-10100](tel:10000-10100)  A  <DUP>  .  PASS  SVTYPE=<DUP>;END=10100;NEXONS=5;BF=6.68;SVLEN=100;RATIO=1.56  GT  0/1

VCF2 (without header):

#CHROM  POS  ID  REF  ALT  QUAL  FILTER  INFO  FORMAT  19N1898
chr1  160  chr1:160-260  C  <DUP>  .  PASS  SVTYPE=<DUP>;END=260;NEXONS=4;BF=8.69;SVLEN=100;RATIO=1.93  GT  0/1
chr1  1050  chr1:1050-1150  G  <DEL>  .  PASS  SVTYPE=<DEL>;END=1150;NEXONS=9;BF=7.87;SVLEN=100;RATIO=1.37  GT  0/1
chr1  10040  chr1:[10040-10140](tel:10040-10140)  C  <DUP>  .  PASS  SVTYPE=<DUP>;END=10140;NEXONS=1;BF=5.48;SVLEN=100;RATIO=0.339  GT  0/1

The first variant of each VCF has an overlap of 40%, the second an overlap of 50%, the third an overlap of 60%. We aim of merging the variants with an overlap of at least 50%.

We obtain the correct result with the following command: jasmine file_list=filelist.txt out_file=merged_test.vcf min_overlap=1.0 --output_genotypes --default_zero_genotype --leave_breakpoints max_dist_linear=0.5 min_dist=-1 which merges the second variant of VCF1 with the second of VCF2 and the third of VCF1 with the third of VCF2.

Reading the DOC and other issues it seems that the overlap of 50% is used with "min_overlap=0.5" and "max_dist_linear=1.0", but with the following command: jasmine file_list=filelist.txt out_file=merged_test.vcf min_overlap=1.0 --output_genotypes --default_zero_genotype --leave_breakpoints max_dist_linear=0.5 min_dist=-1 jasmine all three variants of the VCFs.