Closed paulinarosales closed 5 months ago
Hello,
I will spend some time today looking into this and get back to you.
What I can say right now is that the first person I alpha-tested bam2bakR with 2 years ago was using hisat-3n aligned bams with the same directional mapping setting you mentioned, and ran into the same problem of their library seemingly being converted from reverse to forward stranded. In their case there might have been some other things they were doing that were the actual root cause of this quirk, but I can't remember as it was so long ago. That's just to say, there is evidence that the hisat-3n settings you are using do in fact flip the reads or reverse complement them so as to make it effectively a forward stranded library.
Best, Isaac
Thanks for the quick answer! Also I forgot to add that I also use the reverse strandness parameter for featureCounts (for read attribution ) as -s 2
and the percentage of attributed counts is much higher than when using the option for forward stranded, so my guess us that the HISAT3N option for reverse mapping is not "flipping" the reads, but would be nice to now where the difference is coming from.
I hope this is helful :)
I was going to ask you how featureCounts read assignment went, that is good to know but thickens the plot and definitely confuses me more.
I ran some tests but unfortunately was not able to reproduce the mutation counting anomaly. I aligned the same reverse stranded, paired-end fastq file using hisat3n with 5 different settings:
# Directional mapping reverse
hisat-3n <basic args> --directional-mapping-reverse
# Directional mapping default
hisat-3n <basic args> --directional-mapping
# Directional mapping reverse and rna-strandness set
hisat-3n <basic args> --directional-mapping-reverse --rna-strandness RF
# Only set rna-strandness
hisat-3n <basic args> --rna-strandness RF
# Default settings for everything
hisat-3n <basic args>
Where <basic args>
for me was:
-p 4 -x indices/chr21_test -q -1 WT1_test_R1.fastq.gz -2 WT1_test_R2.fastq.gz --base-change T,C
I built indices with the following code (nothing fancy like splice-site and exon specification for the sake of rapid testing):
hisat-3n-build -p 4 genome.fasta chr21_test
In all cases, using bam2bakR, I had the expected high T-to-C mutation rate. Default directional mapping had a lower T-to-C rate, slightly lower alignment rate, and slightly higher A-to-G mutation rate than the other settings, but the T-to-C rate was an order of magnitude higher than A-to-G mutation rates in all cases.
For these tests, I used version 2.2.1-3n-0.0.3 of HISAT-3n and the current version of bam2bakR on the main branch of this repo.
Some thoughts:
--strandedness
parameter, which is 'F'. So maybe its something small like spelling of the parameter or how you are setting it if/when you are calling the script directly.Best, Isaac
Thank you very much for going through the task of testing the different settings.
Regarding the --strandedness
I did check that the parameter is switched to R
, so that also shouldn't be the problem. My first guess was the switch on strandness but after our discussion and also testing it on my own I can eliminate the possibility of the issue being related to the alignment and mutation calling.
I hope the issue can help as reference for users working with reverse stranded data.
Thanks again for the quick and helpful answer.
Best, Paulina
Sorry to bother you with a follow-up, but can you clarify whether or not you were able to solve the problem, and if so, what the solution was?
Best, Isaac
Hello,
I’m using part of the pipeline for the mutation calling of mRNA PE data, since my data is reverse stranded I implemented my own mapping step to add some options to HISAT3N like the
--directional-mapping-reverse
argument instead of the regular--directional-mapping
nevertheless when using themut_call.py
script and specifying the strandness asR
I get unexpected results like no T>C conversions and high A>G conversions on my ERCCs spike-ins.Due to this I’m wondering if using the
strandedness = R
argument for the mutation calling on top of the--directional-mapping-reverse
in the alignment is adding an unnecessary complement-making step or if there is any recommendation to use this script/tool with the options for reverse stranded data.Thanks!