harvardinformatics / snpArcher

Snakemake workflow for highly parallel variant calling designed for ease-of-use in non-model organisms.
MIT License
63 stars 30 forks source link

Adding additional samples to pipeline #190

Open ruthrivkin opened 1 month ago

ruthrivkin commented 1 month ago

Hi Tim and Cade,

I have a question about adding additional samples to pipeline. I have previously run ~150 polar bear samples through snpArcher, which worked great, and created a multi-sample vcf file for use in downstream analyses. Now I have another 150 samples that I would like to add to the original set. They were sequenced at separate times, at at slightly lower coverage (10x instead of 15x). Is it possible to set snpArcher up in such a way that I can filter, align, etc the new samples and add them in to the merged vcf of the old samples? Sorry if this has an obvious answer, but I have gone through all the docs and I'm still uncertain. I have attached my sample sheet from the initial run in case that is helpful.

Thanks! Ruth PB1_samples.csv

tsackton commented 1 month ago

Hi Ruth,

There are a few ways you can do this. First, if you still have the g.vcf files (should be in a directory like results/{refGenome}/gvcfs/{sample}.g.vcf.gz), then you can recall your new and your old samples together relatively straightforwardly. Just make a new sample sheet that includes both your new and old samples, and rename your (current) final vcf to something else. Snakemake should hopefully automatically realize that most of the work is already done for the old samples, and only generate g.vcfs for the new samples. I would highly recommend a dry run in this case first, to make sure that you are not regenerating and overwriting things you already have (sometimes Snakemake can be tricky with e.g. file modification times making it think it needs to regenerate files that actually exist).

Alternatively, if you call the new sample set in an independent snpArcher run, you can use bcftools merge (https://samtools.github.io/bcftools/bcftools.html#merge) to merge the two VCFs.

Tim

ruthrivkin commented 1 month ago

Thanks! Appreciate the quick answer!