mkirsche / Jasmine

Jasmine: SV Merging Across Samples
MIT License
175 stars 16 forks source link

Help to set the best jasmine parameters #40

Open lgmgeo opened 2 years ago

lgmgeo commented 2 years ago

Hello Melanie,

Thank you for providing this excellent tool ! Love it so much!

I'm currently using Jasmine to evaluate the presence/absence of SV identified with different SV callers in a small WGS cohort. Ultimately, I would like to obtain a single non redundant SV VCF with all the samples of my cohort:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  sample1    sample2    sample3      

.

I'm using Jasmine in 2 steps:

Step 1

For each sample of my cohort : I merge the SVs called with different SV callers (the idea is to determine all SV in a sample, considering all SV type, but without redundancy due to fuzzy coordinates) in order to obtain a unique VCF file for each sample.

=========Jasmine======> sample1_SV_merge.vcf

Wanted SV clustering criteria: => e.g. a maximum of 300bp distance between breakpoints and at least 80% reciprocal overlap by size.

Those are the parameters that I'm using:

jasmine
file_list=files_list.txt 
out_file=sample1_SV_merge.vcf
out_dir=jasmine_tmp
threads=8
min_dist=-1 
max_dist=150 
--nonlinear_dist 
min_overlap=0.8
--allow_intrasample
--ignore_strand 
--normalize_type

I don't use --output_genotypes to obtain only 1 sample column in the VCF. But I would like to keep the most frequent GT. Question 1: How can I obtain that? Is it possible or should I use the --output_genotypes and parse the results myself? Else (with --output_genotypes ), I obtain several sample columns:

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  0_sample1       1_sample1   2_sample1     3_sample1     4_sample1     5_sample1  

Question 2: Is there a way to report a tag associated to the input file where the SV comes from? Something like:

#CHROM  POS      ID      REF     ALT     QUAL    FILTER  INFO                  FORMAT    sample1 
chr6    123605   .       .       <DEL>   .       PASS    SVTYPE=DEL;SVLEN=200  GT:TAG    0/1:smoove,delly 

If no, would it be possible to add an option --tag = a file listing tags associated to VCF files to merge (on separate lines).

<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

File | Tag -- | -- sample1_SV_smoove.vcf | smoove sample1_SV_delly.vcf | delly sample1_SV_CNVpytor.vcf | CNVpytor sample1_SV_Manta.vcf | Manta sample1_SV_Mobster.vcf | Mobster sample1_SV_ExpansionHunter.vcf | ExpansionHunter

This will be really useful!

Step 2

Then I merge the VCF files from all samples (sample1_SV_merge.vcf, sample2_SV_merge.vcf...). Of course, this time, I add the --output_genotypes option.

Thank you very much for any advice/thinking you can provide me,

Best regards,

Véronique