raphael-group / decifer

DeCiFer is an algorithm that simultaneously selects mutation multiplicities and clusters SNVs by their corresponding descendant cell fractions (DCF).
BSD 3-Clause "New" or "Revised" License
20 stars 7 forks source link

Merging VCFs #14

Open ardydavari opened 2 years ago

ardydavari commented 2 years ago

Hi,

I would like to run decifer on my dataset, but I had some questions about the preprocessing step.

It seems that vcf_2_decifer.py expects each sample to have its own column in a single VCF?

Most of the pipelines I have run have produced a single VCF that is compared to the matched normal. Would there be a recommended way to complete the merging process, so that the reference reads for private variants in other samples are calculated correctly?

Thank you

brian-arnold commented 2 years ago

Hi Ardy, Thanks for your interest in decifer.

That is correct that vcf_2_decifer.py expects each sample to have its own column in a single VCF file. Currently, mutect2 and strelka2 both support multi-sample (or joint) calling of somatic SNVs and produce VCF files with a separate column per sample.

However, I can look into options to also allow single-sample somatic VCFs, all from the same patient. This would require additionally having access to mpileup files generated from the BAMs in order to get read counts for reference and alternate alleles at each site in which a variant was called in at least one sample.

Sincerely, Brian

ardydavari commented 2 years ago

Hi Brian,

Thank you for your help! I will definitely look into the multi-sample mode for mutect2, as that may be the best option.

I do think that it would be nice to have the second option as well (especially for some alternate callers out there like CaVEMan). Of note I've seen some other methods tackle this problem by imputing reads from the geometric mean of reads (although this is less desirable).

Sincerely, Ardy