Closed fe4960 closed 1 year ago
EDIT: in cellsnp cmdline, MAF
is the fraction read(UMI)_count_of_minor_allele / read(UMI)_count_of_all_alleles, where the minor allele is the allele with second highest read(UMI) count inferred from data. In cellsnp mode 1, where REF and ALT alleles are specified by user, the minor allele could be neither REF or ALT allele but one of the OTH alleles for a small subset of SNPs. (20230525)
original answer:
Hi, the parameter minMAF is minimum minor allele frequency, which is the minimum between the allele frequencies of REF and ALT for a given SNP site. Here, both allele frequencies are derived from aggregated read counts from all cells (i.e., total_REF_read / total_reads, or total_ALT_read / total_reads). This parameter can be used for SNP filtering.
Hello,
I thought MAF is the frequency of the second most common SNP in a population and minMAF just filters the reference SNP list (mode 1). Is the QA based on running mode 2?
Many thanks for your help.
Best, Florian
Hi Florian,
MAF
is indeed frequently used in population level. However, here MAF
, an option in cellsnp for SNP filtering, is calculated as the frequency of the second most common allele
(the minor allele) for each SNP, in sample level instead of population level. This calculation is used in both mode 1 and mode 2.
Best, Xianjie
Hello Xianjie,
Many thanks for your quick response, that helps a lot! So basically, setting minMAF to very small values might overestimate the SNPs while setting it to high costs sensitivity? I will just move forward with the default value.
Best wishes, Florian
Hi, I agree with you about how different settings of minMAF
would affect SNP calling/filtering. If your downstream task is donor deconvolution, we recommend filtering SNPs with <20UMIs or <10% minor alleles, by setting --minMAF 0.1 --minCOUNT 20
.
Hello,
I wonder the definition of the parameter "minMAF". Does it mean 1) the proportion of reads mapping to an allele for a given SNP site, or, 2) the number of the alteration allele occurrence divided by the total allele number among the individuals with genotype information for a given SNP site?
Thanks a lot!