Open me37uday opened 1 year ago
@me37uday Interesting observation from 10X scRNA. These two warnings are highly related. When you have the second one, you will also have the first one for sure. As IRFinder is designed for bulk RNASeq, there are some obvious compatibility issues on scRNA. I will go through the potential cause of the warnings you saw and try to provide a potential workaround. But no guarantee that they work out-of-the-box, as I have never tried IRFinder on scRNA so far.
Cause:
possorted_genome_bam.bam
from CellRanger is actually single-end. This is because only one end from FASTQ is used for mapping while the other end, containing cell-barcode and UMI, is ignored. According to your library preparation, all the mapped reads are mapped to the opposite strand of the actual transcripts, which is totally normal and expected.Workaround
Try to manually curate a new BAM from possorted_genome_bam.bam
, where you want to minus 16 from every SAM flag to pretend the mapped reads are from the forward stand. And use this new BAM to re-run IRFinder.
Finally, something thoughts about the sparsity of the output matrix. I think this could be due to:
Please post back the irfinder.stdout
again if things don't work out.
Hi @dg520 ,
Thanks for your valuable insight!
I have a follow up question, should I minus 16 from every read or simply replace 16 with 0 to pretend the mapped reads are from the forward strand?
The following are the flags and the number of reads having those flags :
flag occurrences
0 22776239
4 31880363
16 22305740
1024 70268918
1040 49991195
Let me know what you think of the approach. Thanks!
Uday
@me37uday Totally. You are right and I was wrong. For the first trial, let's replate 16 with 0 and 1040 with 1024. You might also want to compare this result to the second trial, where you would flip the original 0 and 16, and flip the original 1024 with 1040.
Hi @dg520 ,
So, I replaced 16 with 0 and 1040 with 1024 and created a new BAM file. I get the following numbers :
flag occurrences
0 31763100
4 5428585
1024 99550691
I get the following when I run infer_experiment.py
:
This is SingleEnd Data
Fraction of reads failed to determine: 0.0000
Fraction of reads explained by "++,--": 0.0304
Fraction of reads explained by "+-,-+": 0.9696
Indicating that the change of 16 to 0 seemed to have worked. However, on running IRFinder on this new BAM file, I still get the same error. Here's the content of irfinder.stdout :
IRFinder version: 1.3.1
IRFinder start: Wed Oct 26 17:23:09 CEST 2022
IRFinder runmode: BAM
IRFinder user@host: urangasw @ valis-02
IRFinder working dir: /home/urangasw/data/sc_RNA_seq_microglia/cr_output/Microglia_MO_AD3_cr/outs
IRFinder reference: /home/urangasw/data/bash_scripts/IR/REF/Human-GRCh38-release106
IRFinder file 1: possorted_genome_bam_1.bam
IRFinder file 2:
---
IRFinder run with options:
Output Dir ir_finder_1/
Main intron reference: /home/urangasw/data/bash_scripts/IR/REF/Human-GRCh38-release106/IRFinder/ref-cover.bed
Read spans reference: /home/urangasw/data/bash_scripts/IR/REF/Human-GRCh38-release106/IRFinder/ref-read-continues.ref
Optional ROI reference: /home/urangasw/data/bash_scripts/IR/REF/Human-GRCh38-release106/IRFinder/ref-ROI.bed
Optional BAM output: NULL
ERR-Total reads processed: 136742376
ERR-Total nucleotides: 12682686954
ERR-Total singles processed: 131313791
ERR-Total pairs processed: 0
ERR-Short pairs: 0
ERR-Intersect pairs: 0
ERR-Long pairs: 0
ERR-Skipped reads: 5428585
ERR-Error reads: 0
Directionality: Dir evidence: 36969
Directionality: Nondir evidence: 0
Directionality: Dir evidence known junctions: 8
Directionality: Nondir evidence known junctions: 0
Directionality: Dir matches ref: 2
Directionality: Dir opposed to ref: 6
Directionality: Dir score all (0-10000): 9999
Directionality: Dir score known junctions (0-10000): 8888
RNA-Seq directionality -1/0/+1: 0
I was not expecting those warnings having done this modification of the flags. What could have gone wrong? Do you think I should have set the flag to 16 instead of 0? Please let me know.
Cheers, Uday
@me37uday I am out of town for a conference. Will be back next week and take a closer look into this. Please just hold on.
Have a good one!
@me37uday After switching the SAM flags, we did have much more reads to support the directionality, indicated by the number at the end of the line "Directionality: Dir evidence:" substantially increased. However, the number at the end of the line "Directionality: Dir evidence known junctions:" has only had a neglectable change. This obviously suggests the junction annotation provided to IRFinder (i.e. GTF used to prepare IRFinder reference) does NOT match with what STAR alignment sees.
Please double check the GTF used for CellRanger and IRFinder reference building are the same, including: 1) they must refer to the same species (e.g. both human, or both mouse) 2) they must have the same chromosome nomenclatures. (e.g. Both have chromosome names starting with "chr", or neither should start with "chr"); 3) they should also be under the same version/release (e.g. both according to Ensemble GRCh38.100).
Hi @dg520 ,
Thanks for your suggestion! Turns out, that was the problem. I was using a different genome version for cellranger and IRFinder. And the nomenclatures were different as well.
Now when I run IRFinder without altering the flags, I don't get any of those warnings. However, the output remains the same even when I modify the flags. Any clarity on this please?
Uday
Was there ever a solution to using IRFinder on scRNA-seq data? I would also like to do this.
@ChristyGault Sorry but no, we currently have no plan to implement IRFinder on scRNA. Partially because we mainly work on 10X scRNA, which is unsuitable for IR measurement.
Just a note to future users- IRFinder works on scRNA-seq data generated with SMART-seq kits. I had to run it in FASTQ mode because there was a mismatch between the STAR reference that was used to generate the BAM files and the STAR reference that IRFinder generated.
Hi @dg520 ,
Hope you are doing well :)
I have been trying to use IRFinder on BAM files generated by cellranger to quantify IR. On doing so, I get the following error :
The BAM files contain ~135 million reads with ~35% of the reads confidently mapping to the intronic region. Yet the output of the analysis is very sparse and I get the above warnings. Please let me know what could have gone wrong in this context and if there is anything that could be done to quantify IR appropriately.
The irfinder.stdout is as follows :
It would be great if you could shed some light on this context. Thanks in advance!
Best, Uday