Closed banedeus closed 3 years ago
What OS is that on? How did you install miRDeep2? Can you successfully run the tutorial with your installation?
Im using Ubuntu 20. I installed through conda and yeah i completed the tutorial successfully
First, be warned that the conda package is not officially supported or maintained by anyone related toi miRDeep2. However, if the tutorial runs, it is likely due to your input data. I'd recommend creating a minimal input dataset that can reproduce the error. Oftentimes, just doing so will give you an idea what might be going wrong. Otherwise, it will at least enable others to reproduce your issue and help you fix it. Try using just ten (random) reads or so and see if you get the error. If not, you can start with a larger chunk and narrow it down. Worst case, you'll end up with a binary search starting with half the data until you get it down to a handful of reads. Once we can look at your data and the exact command you are using, trouble-shooting this should be much easier. :wink:
Okay thank you so far. I will try it as soon as possible and i will post my workflow here.
Just out of curiosity, what might be the problem? What would be wrong with my data? Do you have any idea? And please explain it simply. I am newbie in bioinformatics
Okay so i tried with 15.000 lines of my fastq data and the result is the same. so what i did was First;
mapper.pl SRA.fastq -e -i -h -j -k AGATCGGAAGAG -l 18 -m -p index_2 -s reads_collapsed.fa -t reads_collapsed_vs_genome.arf -v -u
Output;
parsing fastq to fasta format
converting rna to dna alphabet
discarding sequences with non-canonical letters
clipping 3' adapters
discarding short reads
collapsing reads
mapping reads to genome index
trimming unmapped nts in the 3' ends
Log file for this run is in mapper_logs and called mapper.log_64510
Mapping statistics
#desc total mapped unmapped %mapped %unmapped
total: 4214 1826 2388 43.332 56.668
seq: 4214 1826 2388 43.332 56.668
Second;
quantifier.pl -p hairpin.fa -m mature.fa -r reads_collapsed.fa index_2 -y 16_19
Output;
getting samples and corresponding read numbers
Converting input files
building bowtie index
mapping mature sequences against index
mapping read sequences against index
Mapping statistics
#desc total mapped unmapped %mapped %unmapped
total: 4183 12 4171 0.287 99.713
seq: 4183 12 4171 0.287 99.713
analyzing data
0 mature mappings to precursors
Expressed miRNAs are written to expression_analyses/expression_analyses_1626760028/miRNA_expressed.csv
not expressed miRNAs are written to expression_analyses/expression_analyses_1626760028/miRNA_not_expressed.csv
Creating miRBase.mrd file
Mapped READS readin - DONE
make_html2.pl -q expression_analyses/expression_analyses_1626760028/miRBase.mrd -k mature.fa -y 1626760028 -o -i expression_analyses/expression_analyses_1626760028/mature.fa_mapped.arf -l -M miRNAs_expressed_all_samples_1626760028.csv
miRNAs_expressed_all_samples_1626760028.csv file with miRNA expression values
parsing miRBase.mrd file finished
creating PDF files
Warning: 0 mature sequences mapped to any of your given precursor sequences
Note: I assume that tool could not find any mature sequence because my data is really small in that case.
Thirth;
miRDeep2.pl reads_collapsed.fa GCA_018258275.1_ASM1825827v1_genomic.fa reads_collapsed_vs_genome.arf mature.fa annotations_16266764565788.fasta hairpin.fa 2> report.log`
Output;
#####################################
# #
# miRDeep2.0.1.3 #
# #
# last change: 08/11/2019 #
# #
#####################################
miRDeep2 started at 8:48:27
#Starting miRDeep2
#testing input files
#Quantitation of known miRNAs in data
#parsing genome mappings
#excising precursors
#preparing signature
#folding precursors
#computing randfold p-values
#running miRDeep core algorithm
Here is my report.txt
If you need any further information just let me know
edit (@mschilli87): formatting
I suggested
... I'd recommend creating a minimal input dataset that can reproduce the error. [...] Try using just ten (random) reads or so and see if you get the error. ...
Now you say
Okay so i tried with 15.000 lines of my fastq data and the result is the same.
:confused:
The point is that the computer program fails on your data so I want to be able to follow the entire algorithm in my brain to see where it fails. For this I need to actually see the input data. But please don't share 1.500 times more data than I asked for and have me narrow it down.
Just out of curiosity, what might be the problem? What would be wrong with my data? Do you have any idea? And please explain it simply. I am newbie in bioinformatics
Hard to tell without seing the data. Many thing could be wrong. Check past issues for examples. If even a small (much smaller than thousands of lines) input triggers the issue, chances are you specify a wrong flag or there is an issue with the format of your data (e.g. you telling miRDeep to expect a FASTA but using a FASTQ). Regardless, guessing is a waste of time. Please just narrow it down step by step until we can actually understand what is happening.
okay i am deeply sorry what i did. so here is my fastq file and my report.log
I used the same codes above with 10 random reads of my data but this time there is nothing wrong. so i will also include my 15.000 lines of data which gave an error last time.
Also i changed my data names .fastq to .txt in order to upload here.
Just quoting myself again:
... Try using just ten (random) reads or so and see if you get the error. If not, you can start with a larger chunk and narrow it down. Worst case, you'll end up with a binary search starting with half the data until you get it down to a handful of reads. ...
:wink:
I understand your point of view but i have 45 millions of reads. I started with ten and now i am going to try a thousand reads. I saw that i get an error around 4 thousand reads. What i do not understand is, the problem is related with read numbers or some "bad" reads in my data. Simply, how am i going to detect the "problematic" reads in my data. Also i have 4 data that i have to process just like this one. It will take forever
What do you expect? Somebody else doing this work for you? How long does running your script take for with 4000 reads that reproduce the error? Split this file in half and try both halves. Do you get the error for none, both or only one of the halves? If none, it appears to be an issue with the number of reads. I'd continue trying 3000 next, then 2500 or 3500 depending on whether you get the error with 3000 or not and so on. log2(4000) < 12 so by binary search you should be able to get the exact number of reads you need to cause the issue by running miRDeep2 12 times. That's hardly going to take 'forever'. If you get the error with your 2000 reads set in step one, you can apply the same stategy to find 'the read' causing the issue. Once you do, try if using just this one read causes the error. Then you can share the data here and maybe someone can help you.
@banedeus: After formatting you earlier post I noticed this comment of yours that was previously hiding in the log output:
Note: I assume that tool could not find any mature sequence because my data is really small in that case.
This is also something you could verify. Instead if assuming (i.e. guessing), why not simply add a positive control to your input? Simply use your data plus five or so reads mapping to miRNAs (e.g. from the tutorial data, which already worked for you)?
1.) search in the issues section for 'Negative repeat'
This happened already a couple of times and got solved. The problem was caused by inconsistent files. A way to get proper input files is described here
https://drmirdeep.github.io/mirdeep2_tutorial.html
2.) Furthermore, it is unclear what index_2 is doing in this command
quantifier.pl -p hairpin.fa -m mature.fa -r reads_collapsed.fa index_2 -y 16_19
3.) If the tutorial runs fine then it has definitely to do with your input files. Compare the format of the tutorial files with yours and you will find the issues.
4.) The files you uploaded are not useful. Further, your file namings are completely noninformative for us to guess what is inside and what not.
Please run a head -n20 on each of these files and post it here
reads_collapsed.fa GCA_018258275.1_ASM1825827v1_genomic.fa reads_collapsed_vs_genome.arf mature.fa # mature_ref_miRNAs.fa ? annotations_16266764565788.fasta # mature_other_miRNAs.fa ? hairpin.fa # hairpin_ref_miRNAs ?
Warning: 0 mature sequences mapped to any of your given precursor sequences
0 => zero => None of your mature sequences could be mapped to sequences in your precursor files
I ran the following command ;
miRDeep2.pl reads_collapsed.fa GCA_018258275.1_ASM1825827v1_genomic.fa reads_collapsed_vs_genome.arf mature.fa annotations_16266764565788.fasta hairpin.fa
Everything was going fine until i saw an error. First it said:
Then
Then it goes like this
"Use of uninitialized value in numeric le (<=) at /home/bane/anaconda3/bin/miRDeep2_core_algorithm.pl line 1148, line 68767.
Use of uninitialized value in numeric le (<=) at /home/bane/anaconda3/bin/miRDeep2_core_algorithm.pl line 1174, line 68767.
Use of uninitialized value in numeric le (<=) at /home/bane/anaconda3/bin/miRDeep2_core_algorithm.pl line 1148, line 68767." for almost 400.000 lines.
I could not upload the report.log here (ı guess it is because my log file is 57 mb) but here is the link that if you want to check out the log file and it will be stored there for 30 days from now on.
https://easyupload.io/hxygao
What can be the issue and how can i solve it?