10X 5' single-cell data

single-cell-genetics / cellsnp-lite

Efficient genotyping bi-allelic SNPs on single cells

https://cellsnp-lite.readthedocs.io

Apache License 2.0

131 stars 11 forks source link

10X 5' single-cell data #127

Open HenriettaHolze opened 5 months ago

HenriettaHolze commented 5 months ago

Hi @hxj5 , I have a question regarding 10X 5' scRNA-seq data.
For 5' sequencing, the read containing cell barcode and UMI contains part of the transcript https://kb.10xgenomics.com/hc/en-us/articles/360000939852-What-is-the-difference-between-Single-Cell-3-and-5-Gene-Expression-libraries.
The 10X CellRanger pipeline therefore includes both the forward and reverse read in the BAM file.
You mention in #121 that only one read per cell barcode UMI combination is used to extract the allele. In that case, only either the forward or reverse read is considered and almost half the data is discarded. Is this correct?

Cheers, Henrietta

hxj5 commented 5 months ago

Hi, (1) cellsnp-lite does not check the strand (whether it is forward or reverse); (2) SNPs are processed parallelly (i.e., the reads in a CB+UMI group may be iterated multiple times by different SNPs). For each SNP, it uses the first fetched/pileup read covering it within a CB+UMI group while different SNPs may use distinct first reads. Some reads "discarded" for one SNP may be used (as first read) by another SNP. Therefore, "little" information is lost for allele counting although many reads are "discarded" for each specific SNP.

HenriettaHolze commented 5 months ago

Thank you, that makes sense.

wJDKnight commented 3 months ago

Hi, (1) cellsnp-lite does not check the strand (whether it is forward or reverse); (2) SNPs are processed parallelly (i.e., the reads in a CB+UMI group may be iterated multiple times by different SNPs). For each SNP, it uses the first fetched/pileup read covering it within a CB+UMI group while different SNPs may use distinct first reads. Some reads "discarded" for one SNP may be used (as first read) by another SNP. Therefore, "little" information is lost for allele counting although many reads are "discarded" for each specific SNP.

Does that mean one UMI can be counted in multiple SNPs?

hxj5 commented 3 months ago

Does that mean one UMI can be counted in multiple SNPs?

Yes, if the UMI covers multiple SNPs.