Closed g0656116 closed 5 months ago
My guess is you aligned the raw reads using bowtie while miRDeep2 collapses identical reads to speed up the mapping. In sRNA-seq, the exact same read can occur thousands of times (e.g. a highly abundant miRNA + adapters). If you run bowtie that exact sequence is mapped again and again over and over again and each time it counts as a mapped read. In the miRDeep2 flow, this sequence is mapped (anc counted as mapping) only once. Not that for quantification purposes, miRDeep2 considers the read multiplicity in downstream steps.
Thank you very much for the quick reply. Can I ask you a few more questions?
I am analyzing using miRNAs_expressed_all_samples.csv, which is output from quantifier.pl. Can I perform post-analysis with this file? Was this file created taking collapsed identical reads (read multiplicity) into account?
And when I run quantifier.pl, ── expression_analyses │ └── expression_analyses_1717759024 │ ├── bowtie_mature.out │ ├── bowtie_reads.out │ ├── collapsed_2023.fa.converted │ ├── collapsed_2023.fa_mapped.arf │ ├── collapsed_2023.fa_mapped.bwt │ ├── expression_1717759024.html │ ├── mature.converted │ ├── mature.fa_mapped.arf │ ├── mature.fa_mapped.bwt │ ├── mature2hairpin │ ├──miRBase.mrd │ ├──miRNA_expressed.csv │ ├──miRNA_not_expressed.csv │ ├── miRNA_precursor.1.ebwt │ ├── miRNA_precursor.2.ebwt │ ├── miRNA_precursor.3.ebwt │ ├── miRNA_precursor.4.ebwt │ ├── miRNA_precursor.rev.1.ebwt │ ├── miRNA_precursor.rev.2.ebwt │ ├── precursor.converted │ ├── precursor_not_expressed.csv │ ├── read_occ │ └── rna.ps
miRNAs_expressed_all_samples_1717759024.csv expression_1717759024.html
This output is generated. Why can't I generate output like "expression_analyses/expressionanalyses.csv miRNAs_expressed_all_samplesnormalized.csv"?
Thank you for your reply. It's really helpful for my research.
-y
:
quantifier.pl [...] -y foo
will results in file names like expression_foo.html
& miRNAs_expressed_all_samples_foo.csv
.
thank you so much It was a great help I'll come back if I have any other questions!
I performed alignment using the same small RNASeq data and reference genome using miRDeep2 and bowtie1, respectively. It is known that miRDeep2 uses bowtie1 for mapping, but the results using miRDeep2 and the results using bowtie1 show very different mapping rates. While approximately 20% of the results for miRDeep2 are mapped, 85% of the results for bowtie1 are mapped.
The options used are as follows. miRDeep2 mapper.pl 2023.fastq -e -h -i -j -m -p /home/song/miRNAseq/bowtie_index/STAR/index -s collapsed_2023.fa -t genome_after_2023.arf -v -o 1 bowtie1 bowtie -v 1 -p 4 -S /home/song/miRNAseq/bowtie_index/STAR/index 2023.fastq > ./hg38/38_aligned_data.sam
What's the difference? I am curious about how to increase the mapping rate when using miRDeep2 and how to analyze the level of miRNA expression using the sam file, which is the result file of bowtie.