rajewsky-lab / mirdeep2

Discovering known and novel miRNAs from small RNA sequencing data
GNU General Public License v3.0
137 stars 49 forks source link

No file mirdeep_runs/run_28_08_2019_t_15_49_37/tmp/precursors.fa_stack found #43

Closed Yashrajsinh-Jadeja closed 5 years ago

Yashrajsinh-Jadeja commented 5 years ago

Hello all! I am trying to work with miRDeep2 on a transcriptomic data (RNA-seq), however, this error pops up every-time I run the miRDeep2.pl script.

No file mirdeep_runs/run_28_08_2019_t_15_49_37/tmp/precursors.fa_stack found

My parameters are as follows:

date && time miRDeep2.pl ~/Project/reads_collapsed.fa ~/Project/newgenome.fa ~/Project/reads_vs_refdb.arf ~/Project/new_human_mature.fasta none ~/Project/new_human_stem.fasta -t Human -g -1 hsa -v

I have elaborated more about this on my Biostars post. Any and all help would be much appreciated. Thanks.

Drmirdeep commented 5 years ago

What happens if you run it without g -1? Please post the console output here so I can have a look where it goes wrong. I would bet that some of the files are not created or being empty

Yashrajsinh-Jadeja commented 5 years ago

The console log with parameters g -1

date && time miRDeep2.pl ~/Project/reads_collapsed.fa ~/Project/newgenome.fa ~/Project/reads_vs_refdb.arf ~/Project/new_human_mature.fasta none ~/Project/new_human_stem.fasta -t Human -g -1 hsa -v
Wed Aug 28 15:49:37 IST 2019

#####################################
#                                   #
# miRDeep2.0.1.2                    #
#                                   #
# last change: 22/01/2019           #
#                                   #
#####################################

miRDeep2 started at 15:49:37

#Starting miRDeep2
#Starting miRDeep2
/home/lab7/Project/Softwares/mirdeep2-master/bin/miRDeep2.pl /home/lab7/Project/reads_collapsed.fa /home/lab7/Project/newgenome.fa /home/lab7/Project/reads_vs_refdb.arf /home/lab7/Project/new_human_mature.fasta none /home/lab7/Project/new_human_stem.fasta -t Human -g -1 hsa -v

miRDeep2 started at 15:49:37

mkdir mirdeep_runs/run_28_08_2019_t_15_49_37

Use of uninitialized value $tmps in split at /home/lab7/Project/Softwares/mirdeep2-master/bin/miRDeep2.pl line 350, <IN> line 1.
Use of uninitialized value in pattern match (m//) at /home/lab7/Project/Softwares/mirdeep2-master/bin/miRDeep2.pl line 354, <IN> line 1.
Use of uninitialized value $tmps in split at /home/lab7/Project/Softwares/mirdeep2-master/bin/miRDeep2.pl line 377.
#testing input files
#testing input files
started: 15:50:33
sanity_check_mature_ref.pl /home/lab7/Project/new_human_mature.fasta

ended: 15:50:33
total:0h:0m:0s

sanity_check_reads_ready_file.pl /home/lab7/Project/reads_collapsed.fa

started: 15:50:33

ended: 15:50:33
total:0h:0m:0s

started: 15:50:33
sanity_check_genome.pl /home/lab7/Project/newgenome.fa

ended: 15:50:33
total:0h:0m:0s

started: 15:50:33
sanity_check_mapping_file.pl /home/lab7/Project/reads_vs_refdb.arf

ended: 15:50:33
total:0h:0m:0s

started: 15:50:33
sanity_check_mature_ref.pl /home/lab7/Project/new_human_stem.fasta

ended: 15:50:33
total:0h:0m:0s

started: 15:50:33
Quantitation of expressed miRNAs in data

#Quantitation of known miRNAs in data
quantifier.pl -p /home/lab7/Project/new_human_stem.fasta -m /home/lab7/Project/new_human_mature.fasta  -r /home/lab7/Project/reads_collapsed.fa  -t Human -y 28_08_2019_t_15_49_37 -k  

ended: 15:50:33
total:0h:0m:0s

started: 15:50:33
rna2dna.pl /home/lab7/Project/new_human_mature.fasta > mirdeep_runs/run_28_08_2019_t_15_49_37/tmp/new_human_mature.fasta

rna2dna.pl /home/lab7/Project/new_human_stem.fasta > mirdeep_runs/run_28_08_2019_t_15_49_37/tmp/new_human_stem.fasta

ended: 15:50:33
total:0h:0m:0s

#parsing genome mappings
#parsing genome mappings
parse_mappings.pl /home/lab7/Project/reads_vs_refdb.arf -a 0 -b 18 -c 25 -i 5 > mirdeep_runs/run_28_08_2019_t_15_49_37/tmp/reads_vs_refdb.arf_parsed.arf

started: 15:50:33

ended: 15:50:33
total:0h:0m:0s

#excising precursors
#excising precursors
started: 15:50:33
excise_precursors_iterative_final.pl /home/lab7/Project/newgenome.fa mirdeep_runs/run_28_08_2019_t_15_49_37/tmp/reads_vs_refdb.arf_parsed.arf mirdeep_runs/run_28_08_2019_t_15_49_37/tmp/precursors.fa mirdeep_runs/run_28_08_2019_t_15_49_37/tmp/precursors.coords -1
No file mirdeep_runs/run_28_08_2019_t_15_49_37/tmp/precursors.fa_stack found

real    0m57.660s
user    0m13.274s
sys 0m3.129s

The console log without parameter g

date && time miRDeep2.pl ~/Project/reads_collapsed.fa ~/Project/newgenome.fa ~/Project/reads_vs_refdb.arf ~/Project/new_human_mature.fasta none ~/Project/new_human_stem.fasta -t Human hsa
Thu Sep  5 11:56:37 IST 2019

#####################################
#                                   #
# miRDeep2.0.1.2                    #
#                                   #
# last change: 22/01/2019           #
#                                   #
#####################################

miRDeep2 started at 11:56:37

#Starting miRDeep2
#Starting miRDeep2
/home/lab7/Project/Softwares/mirdeep2-master/bin/miRDeep2.pl /home/lab7/Project/reads_collapsed.fa /home/lab7/Project/newgenome.fa /home/lab7/Project/reads_vs_refdb.arf /home/lab7/Project/new_human_mature.fasta none /home/lab7/Project/new_human_stem.fasta -t Human hsa

miRDeep2 started at 11:56:37

mkdir mirdeep_runs/run_05_09_2019_t_11_56_37

Use of uninitialized value $tmps in split at /home/lab7/Project/Softwares/mirdeep2-master/bin/miRDeep2.pl line 350, <IN> line 1.
Use of uninitialized value in pattern match (m//) at /home/lab7/Project/Softwares/mirdeep2-master/bin/miRDeep2.pl line 354, <IN> line 1.
Use of uninitialized value $tmps in split at /home/lab7/Project/Softwares/mirdeep2-master/bin/miRDeep2.pl line 377.
#testing input files
#testing input files
started: 12:0:30
sanity_check_mature_ref.pl /home/lab7/Project/new_human_mature.fasta

ended: 12:0:30
total:0h:0m:0s

sanity_check_reads_ready_file.pl /home/lab7/Project/reads_collapsed.fa

started: 12:0:30

ended: 12:0:30
total:0h:0m:0s

started: 12:0:30
sanity_check_genome.pl /home/lab7/Project/newgenome.fa

ended: 12:0:30
total:0h:0m:0s

started: 12:0:30
sanity_check_mapping_file.pl /home/lab7/Project/reads_vs_refdb.arf

ended: 12:0:30
total:0h:0m:0s

started: 12:0:30
sanity_check_mature_ref.pl /home/lab7/Project/new_human_stem.fasta

ended: 12:0:30
total:0h:0m:0s

started: 12:0:30
Quantitation of expressed miRNAs in data

#Quantitation of known miRNAs in data
quantifier.pl -p /home/lab7/Project/new_human_stem.fasta -m /home/lab7/Project/new_human_mature.fasta  -r /home/lab7/Project/reads_collapsed.fa  -t Human -y 05_09_2019_t_11_56_37 -k  

ended: 12:0:30
total:0h:0m:0s

started: 12:0:30
rna2dna.pl /home/lab7/Project/new_human_mature.fasta > mirdeep_runs/run_05_09_2019_t_11_56_37/tmp/new_human_mature.fasta

rna2dna.pl /home/lab7/Project/new_human_stem.fasta > mirdeep_runs/run_05_09_2019_t_11_56_37/tmp/new_human_stem.fasta

ended: 12:0:30
total:0h:0m:0s

#parsing genome mappings
#parsing genome mappings
parse_mappings.pl /home/lab7/Project/reads_vs_refdb.arf -a 0 -b 18 -c 25 -i 5 > mirdeep_runs/run_05_09_2019_t_11_56_37/tmp/reads_vs_refdb.arf_parsed.arf

started: 12:0:30

ended: 12:0:30
total:0h:0m:0s

#excising precursors
#excising precursors
started: 12:0:30
excise_precursors_iterative_final.pl /home/lab7/Project/newgenome.fa mirdeep_runs/run_05_09_2019_t_11_56_37/tmp/reads_vs_refdb.arf_parsed.arf mirdeep_runs/run_05_09_2019_t_11_56_37/tmp/precursors.fa mirdeep_runs/run_05_09_2019_t_11_56_37/tmp/precursors.coords 50000
No file mirdeep_runs/run_05_09_2019_t_11_56_37/tmp/precursors.fa_stack found

real    3m53.148s
user    0m13.525s
sys 0m3.202s
Drmirdeep commented 5 years ago

Do your fasta headers in the genome file start with '>' and have a newline in the ends? Seems like the error is related to the genome file you are using.

Can you post the output of head -n2 newgenome.fa

Yashrajsinh-Jadeja commented 5 years ago

I am recreating the steps I performed for a better understanding.

_(A little extra information : I downloaded a pre-built index from the GRCh38 iGenomes collection as specified by Bowtie and have been using the reference genome fasta provided along with it as my reference. )_

mapper.pl

date && time mapper.pl ~/Project/Sample/RNAseq_Reads/SRR040572_3.fastq -e -h -j -l 18 -m -p ~/Project/Softwares/mirdeep2-master/essentials/bowtie-1.1.1/indexes/genome -s ~/Project/collapsed.fa -t ~/Project/collapsed.arf -v -n
Sat Sep  7 13:15:25 IST 2019

parsing fastq to fasta format
discarding sequences with non-canonical letters
discarding short reads
collapsing reads
mapping reads to genome index
trimming unmapped nts in the 3' ends
Log file for this run is in mapper_logs and called mapper.log_3201
Mapping statistics

#desc   total   mapped  unmapped    %mapped %unmapped
total: 3793410  841 3792569 0.022   99.978
seq: 3793410    841 3792569 0.022   99.978

real    1m26.755s
user    0m23.540s
sys 0m2.841s

miRDeep2.pl

date && time miRDeep2.pl ~/Project/collapsed.fa ~/Project/genome.fa ~/Project/collapsed.arf ~/Project/new_human_mature.fasta none ~/Project/new_human_stem.fasta -t Human hsa
Sat Sep  7 13:30:06 IST 2019

#####################################
#                                   #
# miRDeep2.0.1.2                    #
#                                   #
# last change: 22/01/2019           #
#                                   #
#####################################

miRDeep2 started at 13:30:07

#Starting miRDeep2
#Starting miRDeep2
/home/lab7/Project/Softwares/mirdeep2-master/bin/miRDeep2.pl /home/lab7/Project/collapsed.fa /home/lab7/Project/genome.fa /home/lab7/Project/collapsed.arf /home/lab7/Project/new_human_mature.fasta none /home/lab7/Project/new_human_stem.fasta -t Human hsa

miRDeep2 started at 13:30:07

mkdir mirdeep_runs/run_07_09_2019_t_13_30_07

Error: Genome file /home/lab7/Project/genome.fa has not allowed whitespaces in its first identifier

real    0m2.042s
user    0m0.237s
sys 0m0.062s

However, it throws an error of whitespaces in the genome file. So I replace the whitespaces in the genome file with an underscore ("_").

 date && time miRDeep2.pl ~/Project/collapsed.fa ~/Project/why.fa ~/Project/collapsed.arf ~/Project/new_human_mature.fasta none ~/Project/new_human_stem.fasta -t Human hsa
Sat Sep  7 13:30:20 IST 2019

#####################################
#                                   #
# miRDeep2.0.1.2                    #
#                                   #
# last change: 22/01/2019           #
#                                   #
#####################################

miRDeep2 started at 13:30:20

#Starting miRDeep2
#Starting miRDeep2
/home/lab7/Project/Softwares/mirdeep2-master/bin/miRDeep2.pl /home/lab7/Project/collapsed.fa /home/lab7/Project/why.fa /home/lab7/Project/collapsed.arf /home/lab7/Project/new_human_mature.fasta none /home/lab7/Project/new_human_stem.fasta -t Human hsa

miRDeep2 started at 13:30:20

mkdir mirdeep_runs/run_07_09_2019_t_13_30_20

The mapped reference id chr1 from file /home/lab7/Project/collapsed.arf is not an id of the genome file /home/lab7/Project/why.fa

real    0m40.613s
user    0m0.698s
sys 0m2.634s

But, it throws an error about incompatible IDs.

Here's a little look into the reference genome file, before and after replacing whitespaces.

This is the original genome file.

head -n10 genome.fa
>chr1  AC:CM000663.2  gi:568336023  LN:248956422  rl:Chromosome  M5:6aef897c3d6ff0c78aff06ac189178dd  AS:GRCh38
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 

This is the file after I squeeze multiple white-spaces into one and then replace that white space with an underscore.

head -n10 why.fa
>chr1_AC:CM000663.2_gi:568336023_LN:248956422_rl:Chromosome_M5:6aef897c3d6ff0c78aff06ac189178dd_AS:GRCh38
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

Now the file "newgenome.fa" is a file that I made after both the above sequences didn't work. I removed all newline characters from the file so it becomes a singular long stretch (I can't show you the file here because of it obviously long nature). And miRDeep2 accepted that file and didn't throw the above-mentioned whitespace and id errors except for the precursors.fa_stack error.

Drmirdeep commented 5 years ago

This is problematic since the IDs don't match anymore between the bowtie index and the genome fasta file. The best option would be to take the newgenome.fa file with proper IDs and rebuild the index yourself.