vibansal / crisp

Code for multi-sample variant calling from sequence data of pooled or unpooled DNA samples
MIT License
19 stars 8 forks source link

What parameters should be used for unknown number of samples in a pool? #12

Closed Buuntu closed 4 years ago

Buuntu commented 5 years ago

What parameters would be best to use for a population with an unknown sample size (likely very large). I did not setup the experiment but this is with bacteria where we didn't streak out and select a single colony, therefore being basically just a population of the bacteria.

vibansal commented 5 years ago

That depends on the lowest allele frequency that one would like to detect. With Illumina sequencing, variants with 2% allele frequency (pool size = 50) or even lower should be detectable with high accuracy.

Buuntu commented 5 years ago

Okay, so the pool size can be thought of as what allele frequency you want to select for? So even though the pool size might be in the 1000s, if we want an allele frequency of say 25%, we would just set the pool size to 4?

vibansal commented 5 years ago

CRISP is designed for pools with discrete number of alleles or chromosomes. The pool size determines the minimum allele frequency that is expected at a variant position.

Buuntu commented 5 years ago

So for an unknown number of alleles (monoploid organism), since we don't know how many organisms are in the pool, it may not be the right tool?

vibansal commented 5 years ago

It can work for the non-discrete model as well. There are other tools such as freebayes or lofreq that allow for continuous allele frequencies.