mkirsche / Jasmine

Jasmine: SV Merging Across Samples
MIT License
175 stars 16 forks source link

Fail to merge SVs when STRANDS not set in one file #41

Open edg1983 opened 2 years ago

edg1983 commented 2 years ago

Hi,

Thanks for the great tool!

I have multiple representation of SVs for the same sample generated using Manta and Lumpy (through Smoove). When I try to use Jasmine to merge calls from the different tools, no overlapping calls are found and the resulting output file is empty. My command is

jasmine --normalize_type --normalize_chrs min_support=2 file_list=filelist out_file=intersection.vcf

If I repeat the same with min_support=1 I can clearly see that there are very similar vars, even identical ones, that are not merged by Jasmine. For example these 2 DEL records. They are exactly the same, reported by both tools with the same POS, SVTYPE and END annotations, but Jasmine fails to merge them:

  1. Manta

    1   1207339 1_MantaDEL:105478:0:1:0:0:0 GGAGACTGTCCTATGTCTTTCTGAGCCTCAGTTTCCCCTGTGGGCACCGAGGGGTTCTGGGACCCTGCCTCCACCAGGAAGCCTCCCTGGATTGCCCAGCCCTGCTTCTGCGCCGTCCAGCACAGGTGGAGACCCCCATGAATGCTGGGGGTGGGGGCTCTCGGGAACGTGAGCGTGGATGTGGTTCAACACCCTTTTGAGACCTGCAGCCACCGCCTCACCCCGTAAGGCGGTTCCTCCTTTTCCAAGGTAAATGACAGGAATTAGCTGTTTGTGACACCCCGGAGTTCTCAAATCCAAGATGTAGGAGCCTGCCTTGGAGAGGCAGCCCTCAGACACTGCAGAGAAGGAAGGGGTCTCTGCAGCTCCAGGCCGCCCCGACGCTCGGAAGGAAAGGGGTGGGGCCAGCTGGGCCTGGGGGC  G   185 PASS    END=1207760;SVTYPE=DEL;SVLEN=-421;CIGAR=1M421D;STARTVARIANCE=0.000000;ENDVARIANCE=0.000000;AVG_LEN=-421.000000;AVG_START=1207339.000000;AVG_END=1207760.000000;SUPP_VEC_EXT=01;IDLIST_EXT=MantaDEL:105478:0:1:0:0:0;SUPP_EXT=1;SUPP_VEC=01;SUPP=1;SVMETHOD=JASMINE;IDLIST=MantaDEL:105478:0:1:0:0:0 GT:FT:GQ:PL:PR:SR   0/1:PASS:140:235,0,137:9,5:6,5
  2. Lumpy

    1   1207339 0_1 N   <DEL>   206.71  .   SVTYPE=DEL;SVLEN=-421;END=1207760;STRANDS=+-:5;CIPOS=-10,9;CIEND=-10,9;CIPOS95=0,0;CIEND95=0,0;SU=5;PE=0;SR=5;PRPOS=9.80198e-21,9.80198e-19,9.80198e-17,9.80198e-15,9.80198e-13,9.80198e-11,9.80198e-09,9.80198e-07,9.80198e-05,0.00980198,0.980198,0.00980198,9.80198e-05,9.80198e-07,9.80198e-09,9.80198e-11,9.80198e-13,9.80198e-15,9.80198e-17,9.80198e-19;PREND=9.80198e-21,9.80198e-19,9.80198e-17,9.80198e-15,9.80198e-13,9.80198e-11,9.80198e-09,9.80198e-07,9.80198e-05,0.00980198,0.980198,0.00980198,9.80198e-05,9.80198e-07,9.80198e-09,9.80198e-11,9.80198e-13,9.80198e-15,9.80198e-17,9.80198e-19;AC=1;AN=2;STARTVARIANCE=0.000000;ENDVARIANCE=0.000000;AVG_LEN=-421.000000;AVG_START=1207339.000000;AVG_END=1207760.000000;SUPP_VEC_EXT=10;IDLIST_EXT=1;SUPP_EXT=1;SUPP_VEC=10;SUPP=1;SVMETHOD=JASMINE;IDLIST=1  GT:GQ:SQ:GL:DP:RO:AO:QR:QA:RS:AS:ASC:RP:AP:AB   0/1:181:206.71:-24,-3,-21:43:31:12:30:11:14:6:0:16:4:0.27

After some tests, I've realised that the issue is that STRANDS is set to +- by Lumpy and absent from Manta INFO fields. I suggest to ignore strand or raise a warning when one of the file do not contain STRAND annotation and --ignore-strand is not set. Not sure how Jasmine treats +- STRANDS values, which means that there is actually no strand determined for the call. Maybe better to ignore strands by default? I'm not familiar with long-read SVs callers, but usually SV callers from short-reads do not output strand annotation.