single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells
https://cellsnp-lite.readthedocs.io
Apache License 2.0
124 stars 11 forks source link

Question for minMAF parameter #77

Closed fe4960 closed 1 year ago

fe4960 commented 1 year ago

Hello,

I wonder the definition of the parameter "minMAF". Does it mean 1) the proportion of reads mapping to an allele for a given SNP site, or, 2) the number of the alteration allele occurrence divided by the total allele number among the individuals with genotype information for a given SNP site?

Thanks a lot!

hxj5 commented 1 year ago

EDIT: in cellsnp cmdline, MAF is the fraction read(UMI)_count_of_minor_allele / read(UMI)_count_of_all_alleles, where the minor allele is the allele with second highest read(UMI) count inferred from data. In cellsnp mode 1, where REF and ALT alleles are specified by user, the minor allele could be neither REF or ALT allele but one of the OTH alleles for a small subset of SNPs. (20230525)


original answer:

Hi, the parameter minMAF is minimum minor allele frequency, which is the minimum between the allele frequencies of REF and ALT for a given SNP site. Here, both allele frequencies are derived from aggregated read counts from all cells (i.e., total_REF_read / total_reads, or total_ALT_read / total_reads). This parameter can be used for SNP filtering.

flde commented 1 year ago

Hello,

I thought MAF is the frequency of the second most common SNP in a population and minMAF just filters the reference SNP list (mode 1). Is the QA based on running mode 2?

Many thanks for your help.

Best, Florian

hxj5 commented 1 year ago

Hi Florian,

MAF is indeed frequently used in population level. However, here MAF, an option in cellsnp for SNP filtering, is calculated as the frequency of the second most common allele (the minor allele) for each SNP, in sample level instead of population level. This calculation is used in both mode 1 and mode 2.

Best, Xianjie

flde commented 1 year ago

Hello Xianjie,

Many thanks for your quick response, that helps a lot! So basically, setting minMAF to very small values might overestimate the SNPs while setting it to high costs sensitivity? I will just move forward with the default value.

Best wishes, Florian

hxj5 commented 1 year ago

Hi, I agree with you about how different settings of minMAF would affect SNP calling/filtering. If your downstream task is donor deconvolution, we recommend filtering SNPs with <20UMIs or <10% minor alleles, by setting --minMAF 0.1 --minCOUNT 20.