single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells
https://cellsnp-lite.readthedocs.io
Apache License 2.0
131 stars 11 forks source link

postprocessing cellsnp outputs #96

Open maxozo opened 1 year ago

maxozo commented 1 year ago

Hi, I would like to exclude some of the sites in the cellsp output files. I noticed that there is this: https://github.com/single-cell-genetics/cellsnp-lite/blob/master/scripts/utils/csp_utils.R which seems to do exactly the job i need by updating the matrices. Just wanted to double check if this actually does the job and the matrix coordinates are updated correctly when a subseted cellSNP.base.vcf.gz is provided ?

hxj5 commented 1 year ago

Hi, the csp_utils.R can be used to update the three sparse matrices when a subset of SNPs is provided. The coordinates (indexes of SNPs, 1st column) in the updated matrices are based on the subsetted VCF instead of the raw cellSNP VCF.

A commit (146658c) was submitted to make the updated matrices to be in the same format as the raw cellSNP matrices. You may run the demo in the script (the @examples section of the function update_cellsnp_matrices). The usage of the function should be straightforward, although it has not been thoroughly tested.