Open BingjieZhang opened 12 months ago
For the "number of mapped reads", I believe it is only from those barcode-valid (barcode in the whitelist or corrected barcode) reads.
Q1: If you need the number for unfilterer mapped reads, "number of mapped reads" is the place to look at. What number do you have in mind? Q2: That is the number of reads with the whitelisted barcodes.
Thanks for your responses! Sorry, but I'm not sure if I fully understand what you mean. What do you mean by 'unfiltered' mapped reads? I prefer to know the number of mapped reads regardless of whether the reads have a valid cell barcode or not. I am trying to figure out why I started with 153,220,788 reads, but ended up with only 30,174,748, lol. The reason I feel confused is that for the same sample, I also did a bulk mapping with Bowtie2. As you can see below, the mapping rate is okay, with an 86.75% overall alignment rate and a 56% unique mapping rate (Bowtie2 counts paired-end fragments once, so it's half the number compared to Chromap, but they are mapped with the same FASTQ files).
However, for Chromap, even before deduplication, the ratio is 37,926,511/153,220,788 = 24.7% So, I want to know at which step I am losing reads. If Number of mapped reads: 71,076,182
already includes valid barcodes filtering step (filtered by the whitelist), what are the filtered reads between Number of uni-mappings: 66,746,464
and Number of barcodes in whitelist: 37,926,511
? I initially thought 'Number of mapped reads' represented the overall mapping rate, but then it is way lower than the results from Bowtie2.
Hopefully, I have explained my questions clearly, and thank you very much for your help in advance.
bulk mapping summary using bowtie2
bowtie2 -x /hg38/ -1 $name\_R1_val_1.fq* -2 $name\_R2_val_2.fq* --local --very-sensitive-local --no-unal --no-mixed --no-discordant --phred33 -I 10 -X 700 -p 5 -q
76500578 reads; of these: 76500578 (100.00%) were paired; of these: 10140036 (13.25%) aligned concordantly 0 times 43298939 (56.60%) aligned concordantly exactly 1 time 23061603 (30.15%) aligned concordantly >1 times 86.75% overall alignment rate
The read with invalid barcode will not be mapped, so the mapped read count won't include them. The number 37926511 is with respect to the read fragment (mate pair together), and 153220788 is the read ends (2 times read fragments). Still, the number of barcodes found in the whitelist is too few, causing the overall low alignment rate. You can run Chromap without whitelist and check the alignment rate, which may confirm that the barcode match step is the culprit.
Hello Chromap Team,
Thank you very much for actively maintaining the chromap!
I recently used Chromap for mapping scATAC-seq data with a barcode whitelist. I found that the log file is a bit confusing. As stated in the documentation, when barcodes and a whitelist are given as input, Chromap will, by default, estimate barcode abundance and perform barcode correction.
I am looking to understand the following QC numbers from the log file:
In relation to these questions:
For Q1, should I refer to the "Number of mapped reads" in the log file? For Q2, what does "Number of barcodes in whitelist" represent? Does it indicate the number of barcodes, or the number of reads with the whitelisted barcodes?
These metrics are very useful for my experimental debugging, and I would greatly appreciate your clarification.