mkirsche / Jasmine

Jasmine: SV Merging Across Samples
MIT License
175 stars 16 forks source link

--dup_to_ins #39

Closed zhqduan closed 2 years ago

zhqduan commented 2 years ago

When I use Jasmine to merge SVs according to the suggested pipeline, two questions about --dup_to_ins:

  1. After running the command jasmine --dup_to_ins --preprocess_only vcf_filelist=<vcf> --comma_filelist, not all duplications were converted to insertions, in one case, for one sample, 107 duplications were converted to "INS", and 32 duplications were kept as "DUP". So what the criterion that convert the duplications to insertions?
  2. After running the command jasmine --dup_to_ins --postprocess_only out_file=<mergedvcf>, a vcf file with the suffix _dupToIns.vcf generated in the directory output. But it seems the _dupToIns.vcf is the vcf file before conversion, and <mergedvcf> was updated as the vcf file after conversion. It's very confusing.

Thanks.

Zhongqu

mkirsche commented 2 years ago

Hi Zhongqu,

  1. Very long duplications are not converted to insertions. By default, this threshold is 10 kbp, but can be adjusted with the max_dup_length parameter.
  2. The reason for this is that when post-processing the merged file, the duplications which were converted to insertions are converted back to duplications. So the _dupToIns.vcf is the merged file with the duplications converted to insertions, while the file has them converted back to duplications. I think the source for confusion is the fact that the dupToIns conversion is used as an intermediate step to standardize merging and breakpoint refinement, while the final VCF retains the original variant types.

I hope that helps! Melanie

zhqduan commented 2 years ago

Thank you for your kind reply Melanie. It is very clear to me now.