Open alexis-sedg opened 1 week ago
Hi Alexis @alexis-sedg,
interesting approach! Do you have a reference or link to that sub-pooling procedure? Why is that the recommendation for that variant caller?
Yes, grenedalf can do that, using the --sample-group-merge-table
option that is provided for most of the commands. You could also merge the bam files into one bam per sample, e.g., with samtools merge if you want that instead. As you say, working on downstream VCFs or mpileups is not the best approach - VCFs are not well suited for pooled data in the first place, and pileup is just a waste of disk space as far as grenedalf is concerned.
Hope that helps, so long Lucas
Hi Lucas,
Yeah, happily! The paper it's from is "A statistical method for the detection of variants from next-generation resequencing of DNA pools" https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2881398/ or alternatively the GitHub https://github.com/vibansal/crisp/tree/master
My understanding is that they use the comparison between multiple replicate pools of the same population to distinguish sequence errors from rare alleles. The explanation they provided was: "In the absence of a variant, the frequency of the reads with a nucleotide different from the reference base at a particular position should be similar across multiple pools. The intuition being that sequencing errors, especially those that depend upon the local sequence context, are likely to be shared across reads in multiple pools. In contrast, presence of a rare variant in a pool is expected to result in an excess of reads with the alternate allele as compared with the other pools. We use a contingency table approach to compute a P-value for the null hypothesis in the absence of a SNP (see [Fig. 1] for an illustration of this idea)."
Excellent, I'll give the merge function a go! Read quality and quantity is variable across my data, even between samples from the same groups. Are there any additional considerations or recommendations you have to deal with the variability or is it alright to run the merge function as is?
Thanks for your time, Alexis
Hello,
I'm interested in using this pipeline for pooled data. However, when we designed the study, we used the sub-pooling method recommended by CRISP (variant caller). So instead of having a singular BAM file for a given population, I have multiple. I assume I can use the downstream VCF or mpileup as part of your pipeline but I'd prefer to use the BAMs as inputs. Is there a way to go about using multiple BAMs for a given sample population in the Grenedalf pipeline?
Thank you, Alexis