single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells
https://cellsnp-lite.readthedocs.io
Apache License 2.0
131 stars 11 forks source link

minimal minor allele count filter? #108

Closed chilampoon closed 11 months ago

chilampoon commented 11 months ago

Hello there -

thanks for developing this great and fast tool. I wonder if there's a filter for the minimal count of minor allele? I only found --minMAF argument. Because for some of my samples, many pileup positions have only 1 minor allele and they seem to be sequencing/alignment errors or some weird noise, and also I don't want to set the minMAF to be too high as it may throw away many low-frequency signals. If there is an option to set the minimal number of minor allele counts it'd be very helpful, thanks in advance.

hxj5 commented 11 months ago

Hi, cellsnp-lite does not have this option. For now, it has to be done by post-hoc analysis. We initialized a notebook scripts/post_hoc/subset_with_minAD_issue108.ipynb, which could be a starting point for this task. This notebook can also serve as an example to show how to load cellsnp-lite output, subset SNPs, and save the subset data (commit 83c102f).

chilampoon commented 11 months ago

thank you so much @hxj5 ! Yes the filtering can be done post-hoc, yet it'd be also nice to include minor allele count as a potential argument to reduce some processing time. For instance one of my cellsnp.base.vcf is ~3GB large as it contains many mismatches with only 1 minor allele count, and after filtering the file size goes down to ~300MB