Open HenriettaHolze opened 5 months ago
Hi, (1) cellsnp-lite does not check the strand (whether it is forward or reverse); (2) SNPs are processed parallelly (i.e., the reads in a CB+UMI group may be iterated multiple times by different SNPs). For each SNP, it uses the first fetched/pileup read covering it within a CB+UMI group while different SNPs may use distinct first reads. Some reads "discarded" for one SNP may be used (as first read) by another SNP. Therefore, "little" information is lost for allele counting although many reads are "discarded" for each specific SNP.
Thank you, that makes sense.
Hi, (1) cellsnp-lite does not check the strand (whether it is forward or reverse); (2) SNPs are processed parallelly (i.e., the reads in a CB+UMI group may be iterated multiple times by different SNPs). For each SNP, it uses the first fetched/pileup read covering it within a CB+UMI group while different SNPs may use distinct first reads. Some reads "discarded" for one SNP may be used (as first read) by another SNP. Therefore, "little" information is lost for allele counting although many reads are "discarded" for each specific SNP.
Does that mean one UMI can be counted in multiple SNPs?
Does that mean one UMI can be counted in multiple SNPs?
Yes, if the UMI covers multiple SNPs.
Hi @hxj5 , I have a question regarding 10X 5' scRNA-seq data.
For 5' sequencing, the read containing cell barcode and UMI contains part of the transcript https://kb.10xgenomics.com/hc/en-us/articles/360000939852-What-is-the-difference-between-Single-Cell-3-and-5-Gene-Expression-libraries.
The 10X CellRanger pipeline therefore includes both the forward and reverse read in the BAM file.
You mention in #121 that only one read per cell barcode UMI combination is used to extract the allele. In that case, only either the forward or reverse read is considered and almost half the data is discarded. Is this correct?
Cheers, Henrietta