single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells
https://cellsnp-lite.readthedocs.io
Apache License 2.0
131 stars 11 forks source link

more reads filter options? #85

Closed zhentaoyoung closed 1 year ago

zhentaoyoung commented 1 year ago

Hi guys, thank you so much for sharing this very useful package. I am just wondering whether you plan to add more read filter options in the future. I noticed that a lot of the SNP calls in a cell are made based on one read. Would you consider adding a filter like a minimum reads number per cell that covers the SNP site to make more confident calls? Thank you.

hxj5 commented 1 year ago

Hi, there is an cmdline option --minCOUNT for SNP filtering, which is the minimum aggregated UMI or read counts of all cells.

zhentaoyoung commented 1 year ago

Hi, there is an cmdline option --minCOUNT for SNP filtering, which is the minimum aggregated UMI or read counts of all cells.

Hi Xianjie, Thank you for your reply. Yes, I am aware of that filter. But I think that is a combined count of all cells. Would it be possible to have a filter for the read depth of each SNP per cell? This paper mentioned that the positive true calling rate increases with read depth at the SNP site https://www.nature.com/articles/s41467-018-07170-5. But their pipeline is very difficult to use. I am just thinking having a read-depth filter may help to increase the accuracy of calling.

hxj5 commented 1 year ago

Hi, filtering by read depth per cell should be useful as you pointed out. We may add an cmdline option in future release. For now, you may perform post-hoc filtering, either on the VCF file (cellSNP.cells.vcf.gz) or the DP matrix file.

zhentaoyoung commented 1 year ago

Hi, filtering by read depth per cell should be useful as you pointed out. We may add an cmdline option in future release. For now, you may perform post-hoc filtering, either on the VCF file (cellSNP.cells.vcf.gz) or the DP matrix file.

Hi Xianjie, Thank you! That would be very helpful!