single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells
https://cellsnp-lite.readthedocs.io
Apache License 2.0
124 stars 11 forks source link

10X 5' single-cell data #127

Open HenriettaHolze opened 3 weeks ago

HenriettaHolze commented 3 weeks ago

Hi @hxj5 , I have a question regarding 10X 5' scRNA-seq data.
For 5' sequencing, the read containing cell barcode and UMI contains part of the transcript https://kb.10xgenomics.com/hc/en-us/articles/360000939852-What-is-the-difference-between-Single-Cell-3-and-5-Gene-Expression-libraries.
The 10X CellRanger pipeline therefore includes both the forward and reverse read in the BAM file.
You mention in #121 that only one read per cell barcode UMI combination is used to extract the allele. In that case, only either the forward or reverse read is considered and almost half the data is discarded. Is this correct?

Cheers, Henrietta

hxj5 commented 3 weeks ago

Hi, (1) cellsnp-lite does not check the strand (whether it is forward or reverse); (2) SNPs are processed parallelly (i.e., the reads in a CB+UMI group may be iterated multiple times by different SNPs). For each SNP, it uses the first fetched/pileup read covering it within a CB+UMI group while different SNPs may use distinct first reads. Some reads "discarded" for one SNP may be used (as first read) by another SNP. Therefore, "little" information is lost for allele counting although many reads are "discarded" for each specific SNP.

HenriettaHolze commented 3 weeks ago

Thank you, that makes sense.