Closed flde closed 1 year ago
Hi Florian,
Thanks for the question. For typical task of donor deconvolution, such as on scRNA-seq data of mixed cells from multiple patients, you only need to perform SNP calling & genotyping once before passing the genotypes to vireo. Details of genotyping could be found in vireo manual. Please correct me if I misunderstood your question.
Best, Xianjie
Hi @hxj5,
My apologies. What I want to test is different minMAF threshold e.g. 10%, 5%, 1%. But I want to avoid running cellsnp for every threshold. So, could I run cellsnp once with minMAF 1% and then filter the result cellSNP.base.vcf.gz by 10% and 5% to achieve the same result?
For filtering the result file, I would use bcftools view -q 0.05:minor cellSNP.base.vcf.gz
and pass this to vireon.
Best wishes, Florian
EDIT: post-filtering SNPs based on the minimum allele frequency of the REF and ALT alleles in VCF file could be different from filtering SNPs with --minMAF
in the cellsnp cmdline, for a small subset of SNPs whose major allele (with highest read/UMI count) or minor allele (second highest) is neither REF or ALT allele but one of the OTH alleles (in mode 1). See detailed discussion on #93 (20230525)
original answer:
yes, you could post-filter SNPs in the way as you mentioned (i.e., run cellsnp with minMAF 1% and then post-filter with minMAF 5%, 10%). Thanks for the clarification.
I have little experience about bcftools view -q
. Basically, to filter SNPs outputted by cellsnp, based on minMAF threshold, you could test min(AD/DP, (DP-AD)/DP) < minMAF_threshold
on the cellSNP.base.vcf
file and update the three matrices accordingly.
@hxj5,
Many thanks! One las follow-up. Is there a tool to update the metrics you would reccomend? I see some https://rdrr.io/github/davismcc/cardelino/man/load_cellSNP_vcf.html but maybe you have a hint.
Thank you so much for your time!
Best wishes, Florian
Hi Florian,
I have uploaded a demo R script csp_utils.R
to the scripts/utils dir (6f487d3). You may download the script and then call the update_cellsnp_matrices
function to update the three sparse matrices. The usage of the function should be straightforward, although it has not been thoroughly tested.
Best, Xianjie
Hi Xianjie,
That is so very kind of you! I think having such an option is very helpful to optimize cellsnp downstream.
In my case I have samples of only host and mixed host/donor cells. So, I run cellsnp+vireo on the pooled data and then split it again. So, I can use the host only samples to estimate the sensitivity/specificity.
The challenge is that the ratio of host/donor cells varies and also some host/donor are genetically closer related than others. I think in such cases one could optimize minMAF a bit. Having a tool to split the SNPs by minMAF is a great enhancement from my point of few.
Again, many thanks and all the best, Florian
Hello Xianjie,
I figured out how to load and manipulate the cellsnp matrix. Currently I am running cellsnp on the filtered CellRanger output. However, in the end I will only use cells that pass my QC pipline which includes doublet removel etc.
I could now filter the cellsnp matrix for cell ids that pass QC and re-compute them before running vireo. Would that be a good idea or is there something flawed?
Many thanks for your help, Florian
Hi Florian,
It should be fine, provided that the number of filtered cells is limited (so that there would be little impact on the SNP calling & genotyping).
Best, Xianjie
Hello all,
I have clinical samples of mixed host/donor cells. I found that minMAF might be optimized for each patient. Thanks to your help I understand now that minMAF filters the SNP list based on the MAF diverged from the single cell reads (#77).
To generate SNP lists with different minMAF I would need to run cellsnp-lite iteratively which takes a lot of time/resources. Do I understand correctly that I could instead filter the cellsnp-lite result vcf.gz file with bcftools at different minMAF threshold before passing it to vireo?
Many thanks again for your help!
Best wishes, Florian