rajewsky-lab / mirdeep2

Discovering known and novel miRNAs from small RNA sequencing data
GNU General Public License v3.0
135 stars 49 forks source link

Use of uninitialised value message repeated in output many times #102

Closed elayton13 closed 1 year ago

elayton13 commented 2 years ago

Hello

I am using fastQ files that have already had adaptors removed and have been size and quality filtered using cutadapt. The parameters I run are:

mapper.pl /samples_mirdeep.txt -d -e -h -m \ -p genome.fa.ind -s reads_collapsed.fa -t reads_collapsed_vs_genome.arf -v

miRDeep2.pl /reads_collapsed.fa /genome.fa /reads_collapsed_vs_genome.arf \

/mature_nws.fa none /hairpin_nws.fa \ 2>report.log In the output file I get: Use of uninitialized value in numeric gt (>) at /opt/gridware/depots/8e896c5a/el7/pkg/apps/mirdeep2/0.1.1/gcc-4.8.5/bin/quantifier.pl line 987, line 1949222 The use of uninitialized value line is repeated many times, it seems to be the same message 17 times before line 987 becomes 988 and so forth. I think this is similar to #24 and #26, but I'm still unclear about whether this message indicates a problem and what the fix was. I also saw #27 and #28 with similar messages but in my case I don't have samples with no miRNA hits. However it may be relevant to note that my samples typically have a low proportion of reads mapping to miRNAs (due to tissue type, based on previous analysis using bowtie and featurecounts for the same data). I get this same message when I use these same parameters but a different genome index and different reference .fa files for the miRDeep2.pl command (specifying -t mmu and providing mouse and rat miRNA fastas). Just for context I expect to find mouse miRNAs, and miRNAs of another species for which miRNAs aren't currently annotated / on mirbase hence why I am trying both. I have run the sanity check scripts to check for issues in the supplied fasta files. All seems ok (no whitespace or disallowed characters). Nothing is output in the error log apart from that the controls were performed. Other than this, seems to run ok and I think I am getting all the output files I should be. Please let me know if you need any more info or a look at any of my files Appreciate your time
Drmirdeep commented 2 years ago

Since it is in the PrintExpressionSamples routine I can only guess that a particular miRNA in one of your samples is not having any reads mapped to it. Maybe you can see if there are missing values in the output count matrix.

elayton13 commented 2 years ago

Thanks for the quick reply. There are lots of miRNAs in the reference fasta files supplied to miRDeep2.pl that don't have any counts in any of my samples. It does seem that the number of lines of the message are proportional to that. From previous analysis I only detect approx 150 mouse miRNAs in my samples out of the 2000 odd annotated on the mouse genome. In that case shall I just ignore the message?

I do also get some disconcerting mirdeep scores for false positives and true positives (being very high and very low, with respect to the number of novel miRs predicted at each score). Do you expect this also to be due to the fact that my samples are a bit atypical in that they have low numbers of miRNAs detected? I plan to proceed by filtering the novel predictions based on number of reads assigned to mature and star arms, and a significant randfold value. I'll attach a picture of my scores

Screenshot 2022-07-25 at 4 37 18 PM
Drmirdeep commented 1 year ago

Given the survey table the data doesn't seem to look pretty good for miRNA prediction. However, if you check the pdfs and how the reads are aligned to it you maybe get an idea what is going on.

The error messages do not seem to be a problem and do not affect the prediction routine. It is something in the quantifier which runs independently.

Drmirdeep commented 1 year ago

Since there is no reaction here this gets closed