Closed sr320 closed 2 years ago
Nice reference on what some of these filters are:
https://gatk.broadinstitute.org/hc/en-us/articles/360035890471-Hard-filtering-germline-short-variants
Ideally, you would plot the distribution of read depth across sites prior to filtering to estimate read depth parameters, but these should be fine unless you end up with very few SNPs passing filter.
Changed DP (read depth) < 15 and > 150, to < 10 and > 200.
Changed minimum allele frequency (AF) < 0.3 to < 0.05 (this is the most important)
--filter-name "FS" \
--filter "FS > 60.0" \
--filter-name "QD" \
--filter "QD < 2.0" \
--filter-name "QUAL30" \
--filter "QUAL < 30.0" \
--filter-name "SOR3" \
--filter "SOR > 3.0" \
--filter-name "DP10" \
--filter "DP < 10" \
--filter-name "DP200" \
--filter "DP > 200" \
--filter-name "AF05" \
--filter "AF < 0.05"
I have updated repo by redoing analysis up to hard filter.. standing by for recommendation on filtering. https://github.com/sr320/ceabigr/blob/main/code/11-RNAseq-snps.Rmd