mkirsche / Jasmine

Jasmine: SV Merging Across Samples
MIT License
180 stars 17 forks source link

Expected output? #7

Closed wdecoster closed 4 years ago

wdecoster commented 4 years ago

Hi,

I expected to get a VCF with per variants-file a new column with genotypes ('wide'-format?), but what I get is a VCF with just a single "sample" (identifier taken from the first VCF), and a long list of variants which seem to iterate through all variants I had in my files. It starts with chr1-2-3-4-5 etc for sample1, then restarts at chr1 for the second sample,... etc. The only way for me to connect variants in the merged file with the original sample is by using the SUPP_VEC?

Or did I do something wrong?

Thanks, Wouter

mkirsche commented 4 years ago

Hi Wouter,

Yes, what you described is the expected output from Jasmine by default. The intent there is to avoid extremely large VCFs in the case where there are many samples. As you pointed out, the SUPP_VEC (and IDLIST) allows you to trace back to the original VCF entries. If you prefer the output in the more traditional one-sample-per-column format, you can use the --output_genotypes flag which outputs the additional columns.

I hope that helps! Melanie

wdecoster commented 4 years ago

Aha, I'll give that a try. Thanks!