oschakoory / RiboTaxa

RiboTaxa: combined approaches for rRNA genes taxonomic resolution down to the species level from metagenomics data revealing novelties.
https://academic.oup.com/nargab/article/4/3/lqac070/6708509
GNU Affero General Public License v3.0
7 stars 0 forks source link

[E::idx_find_and_load] Could not retrieve index file for #3

Closed palomo11 closed 1 year ago

palomo11 commented 1 year ago

Hi,

While running Ribotaxa:

bash -i /software/RiboTaxa/Pipeline_RiboTaxa_PE.sh /software/RiboTaxa/RiboTaxa_arguments.conf

I have encountered this error:

        bowtie-build command:
        bowtie-build -o 3 /path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.00/iter.00.cons.fasta /path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.00/bowtie.index.iter.00 > /path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.00/bowtie.iter.00.log
        bowtie command:
        cat  /path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/emirge_tmp_reads_1.fastq |  bowtie --phred33-quals -t -p 40  -n 3 -l 20 -e 300  --best --strata --all --sam --chunkmbs 128 --minins 300 --maxins 600 /path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.00/bowtie.index.iter.00 -1 - -2 /path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/emirge_tmp_reads_2.fastq | samtools view -S -h -u -b -F 0x0004 - > /path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.00/bowtie.iter.00.PE.u.bam
        Finished Bowtie for iteration 00 at Wed Nov 30 18:25:58 2022:
DONE with read mapping for iteration 0 at Wed Nov 30 18:25:58 2022 [0:00:10.112919]...
Finished iteration 0 at Wed Nov 30 18:25:58 2022...
Total time for iteration 0: 0:00:19.636305
Starting iteration 1 at Wed Nov 30 18:25:58 2022...
Reading bam file /path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.00/bowtie.iter.00.PE.u.bam at Wed Nov 30 18:25:58 2022...
[E::idx_find_and_load] Could not retrieve index file for '/path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.00/bowtie.iter.00.PE.u.bam'
[E::idx_find_and_load] Could not retrieve index file for '/path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.00/bowtie.iter.00.PE.u.bam'
[E::idx_find_and_load] Could not retrieve index file for '/path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.00/bowtie.iter.00.PE.u.bam'

        bowtie-build command:
        bowtie-build -o 3 /path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.01/iter.01.cons.fasta /path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.01/bowtie.index.iter.01 > /path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.01/bowtie.iter.01.log
        bowtie command:
        cat  /path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/emirge_tmp_reads_1.fastq |  bowtie --phred33-quals -t -p 40  -n 3 -l 20 -e 300  --best --strata --all --sam --chunkmbs 128 --minins 300 --maxins 600 /path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.01/bowtie.index.iter.01 -1 - -2 /path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/emirge_tmp_reads_2.fastq | samtools view -S -h -u -b -F 0x0004 - > /path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.01/bowtie.iter.01.PE.u.bam
        Finished Bowtie for iteration 01 at Wed Nov 30 18:26:28 2022:
DONE with read mapping for iteration 1 at Wed Nov 30 18:26:28 2022 [0:00:10.717669]...
Finished iteration 1 at Wed Nov 30 18:26:28 2022...
Total time for iteration 1: 0:00:29.853800
Starting iteration 2 at Wed Nov 30 18:26:28 2022...
Reading bam file /path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.01/bowtie.iter.01.PE.u.bam at Wed Nov 30 18:26:28 2022...
[E::idx_find_and_load] Could not retrieve index file for '/path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.01/bowtie.iter.01.PE.u.bam'
[E::idx_find_and_load] Could not retrieve index file for '/path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.01/bowtie.iter.01.PE.u.bam'
[E::idx_find_and_load] Could not retrieve index file for '/path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.01/bowtie.iter.01.PE.u.bam'

...

Finished iteration 40 at Wed Nov 30 18:46:19 2022...
Total time for iteration 40: 0:00:30.781029
Converting last mapping file (bowtie.iter.40.PE.u.bam) to compressed bam at Wed Nov 30 18:46:19 2022...
DONE Converting last mapping file (bowtie.iter.40.PE.u.bam) to compressed bam at Wed Nov 30 18:46:24 2022.
Traceback (most recent call last):
  File "/software/RiboTaxa/scripts/run_MetaRib_PE.py", line 423, in <module>
    main()
  File "/software/RiboTaxa/scripts/run_MetaRib_PE.py", line 374, in main
    config.read(config_file)
  File "/xxx/python/anaconda3/2021.11/envs/RiboTaxa_py27/lib/python2.7/ConfigParser.py", line 305, in read
    self._read(fp, filename)
  File "/xxx/python/anaconda3/2021.11/envs/RiboTaxa_py27/lib/python2.7/ConfigParser.py", line 512, in _read
    raise MissingSectionHeaderError(fpname, lineno, line)
ConfigParser.MissingSectionHeaderError: File contains no section headers.
file: /software/RiboTaxa/RiboTaxa_arguments.conf, line: 3
'Other tool parameters have been optimized and can be left at default. \n'

These three directories have been produced:

quality_control output_sortmerna SSU_sequences

Any idea about what's wrong?

oschakoory commented 1 year ago

Hi, can you show me your /software/RiboTaxa/RiboTaxa_arguments.conf ?

There might be something at line 3 of this file? Can you check?

Thanks.

palomo11 commented 1 year ago

Hi, yes it seems there was a # missing in line three:

##The configuration file is very important and each parameter needs to be filled to avoid errors.
##Mandatory parameters are denoted by **. Most of them depend on the configuration of your computer capacities/configuration and of your sequencing data.
Other tool parameters have been optimized and can be left at default. 

I have now modified that:

##The configuration file is very important and each parameter needs to be filled to avoid errors.
##Mandatory parameters are denoted by **. Most of them depend on the configuration of your computer capacities/configuration and of your sequencing data.
Other tool parameters have been optimized and can be left at default. 

However, I'm still getting the "Could not retrieve index file" error:

[E::idx_find_and_load] Could not retrieve index file for '/path/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/iter.00/bowtie.iter.00.PE.u.bam'

DONE Rewriting reads with indexes in headers at Thu Dec  1 15:25:37 2022 [0:00:00.917237]...
Number of reads (or read pairs) in input file(s): 219835
Preallocating reads and quals in memory at Thu Dec  1 15:25:37 2022...
DONE Preallocating reads and quals in memory at Thu Dec  1 15:25:39 2022 [0:00:01.712992]...
Performing initial mapping with command:
cat  /scratch/2022-11-23/ese-alexp/Dapeng/RiboTaxa_Sample1/Sample1/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/emirge_tmp_reads_1.fastq |  bowtie --phred33-quals -t -p 40 -n 3 -l 20 -e 300 --best --sam --chunkmbs 128 --minins 300 --maxins 600 /data/ese-alexp/databases/Silva/bowtie_indexed_DB/SILVA_138.1_SSURef_NR99_tax_silva_bowtie_indexed -1 - -2 /scratch/2022-11-23/ese-alexp/Dapeng/RiboTaxa_Sample1/Sample1/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/emirge_tmp_reads_2.fastq | samtools view -S -h -u -b -F 0x0004 - > /scratch/2022-11-23/ese-alexp/Dapeng/RiboTaxa_Sample1/Sample1/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/initial_mapping/initial_bowtie_mapping.PE.u.bam 
Beginning initialization at Thu Dec  1 15:25:46 2022...
Reading bam file /scratch/2022-11-23/ese-alexp/Dapeng/RiboTaxa_Sample1/Sample1/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/initial_mapping/initial_bowtie_mapping.PE.u.bam at Thu Dec  1 15:25:46 2022...
[E::idx_find_and_load] Could not retrieve index file for '/scratch/2022-11-23/ese-alexp/Dapeng/RiboTaxa_Sample1/Sample1/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/initial_mapping/initial_bowtie_mapping.PE.u.bam'
[E::idx_find_and_load] Could not retrieve index file for '/scratch/2022-11-23/ese-alexp/Dapeng/RiboTaxa_Sample1/Sample1/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/initial_mapping/initial_bowtie_mapping.PE.u.bam'
[E::idx_find_and_load] Could not retrieve index file for '/scratch/2022-11-23/ese-alexp/Dapeng/RiboTaxa_Sample1/Sample1/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/initial_mapping/initial_bowtie_mapping.PE.u.bam'
Culled 4514 sequences in iteration -1 due to low fraction of reference sequence bases covered by >= 1 reads
DONE Reading bam file /scratch/2022-11-23/ese-alexp/Dapeng/RiboTaxa_Sample1/Sample1/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/initial_mapping/initial_bowtie_mapping.PE.u.bam at Thu Dec  1 15:25:48 2022 [0:00:01.373412]...
DONE with initialization at Thu Dec  1 15:25:48 2022...
Starting iteration 0 at Thu Dec  1 15:25:48 2022...
Reading bam file /scratch/2022-11-23/ese-alexp/Dapeng/RiboTaxa_Sample1/Sample1/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/initial_mapping/initial_bowtie_mapping.PE.u.bam at Thu Dec  1 15:25:48 2022...
[E::idx_find_and_load] Could not retrieve index file for '/scratch/2022-11-23/ese-alexp/Dapeng/RiboTaxa_Sample1/Sample1/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/initial_mapping/initial_bowtie_mapping.PE.u.bam'
[E::idx_find_and_load] Could not retrieve index file for '/scratch/2022-11-23/ese-alexp/Dapeng/RiboTaxa_Sample1/Sample1/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/initial_mapping/initial_bowtie_mapping.PE.u.bam'
[E::idx_find_and_load] Could not retrieve index file for '/scratch/2022-11-23/ese-alexp/Dapeng/RiboTaxa_Sample1/Sample1/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/initial_mapping/initial_bowtie_mapping.PE.u.bam'
Culled 4514 sequences in iteration 00 due to low fraction of reference sequence bases covered by >= 1 reads
DONE Reading bam file /scratch/2022-11-23/ese-alexp/Dapeng/RiboTaxa_Sample1/Sample1/SSU_sequences/output_emirge/Sample1_amplicon_16S18S_recons/initial_mapping/initial_bowtie_mapping.PE.u.bam at Thu Dec  1 15:25:49 2022 [0:00:01.126556]...
Calculating likelihood (6307, 219835) for iteration 0 at Thu Dec  1 15:25:49 2022...
        Calculating Pr(N=n) for iteration 0 at Thu Dec  1 15:25:49 2022...
        DONE calculating Pr(N=n) for iteration 0 at Thu Dec  1 15:25:50 2022 [0:00:00.517614]...
DONE Calculating likelihood for iteration 0 at Thu Dec  1 15:25:50 2022 [0:00:00.737791]...

... (HERE THERE ARE THE 39 NEXT ITERATION, JUST OMITTING THEM TO SAVE SPACE)...

Finished iteration 40 at Thu Dec  1 15:46:30 2022...
Total time for iteration 40: 0:00:30.693839
Converting last mapping file (bowtie.iter.40.PE.u.bam) to compressed bam at Thu Dec  1 15:46:30 2022...
DONE Converting last mapping file (bowtie.iter.40.PE.u.bam) to compressed bam at Thu Dec  1 15:46:35 2022.
Traceback (most recent call last):
  File "/data/ese-alexp/software/RiboTaxa/scripts/run_MetaRib_PE.py", line 423, in <module>
    main()
  File "/data/ese-alexp/software/RiboTaxa/scripts/run_MetaRib_PE.py", line 374, in main
    config.read(config_file)
  File "/work/ese-alexp/software/python/anaconda3/2021.11/envs/RiboTaxa_py27/lib/python2.7/ConfigParser.py", line 305, in read
    self._read(fp, filename)
  File "/work/ese-alexp/software/python/anaconda3/2021.11/envs/RiboTaxa_py27/lib/python2.7/ConfigParser.py", line 546, in _read
    raise e
ConfigParser.ParsingError: File contains parsing errors: /data/ese-alexp/software/RiboTaxa/RiboTaxa_arguments.conf
        [line 102]: 'to increase this value for complex communities. \n'
oschakoory commented 1 year ago

Hi, i think you need to recheck your /data/ese-alexp/software/RiboTaxa/RiboTaxa_arguments.conf.

It might be the same missing # at line 102.

palomo11 commented 1 year ago

Hi,

It seems there was a broken line there and a missing #. After chaning that

I have another question. I got 1694 sequences in the metagenome_SSU_taxonomy_abundance.tsv file. But all of them have the same number of assigned reads (1) and relative abundance (0.059). Is that normal?

oschakoory commented 1 year ago

Hi, Normally the relative abundances should not be like that for all sequences. This issue arises when the version of conda is not updated and due to that, some dependencies in RiboTaxa yml file fail to install.

Do these steps:

  1. Update conda

    conda update --all --yes
  2. Then please ensure you have the latest yml file for RiboTaxa installation (recommended)

    git clone https://github.com/oschakoory/RiboTaxa.git
    cd RiboTaxa
  3. remove old version of dependencies

    conda remove -n RiboTaxa_py27
  4. re-install the virtual environment

    conda env create -n RiboTaxa_py27 --file RiboTaxa_py27_requirements.yml

Once you do these steps, you can re run RiboTaxa on your samples. If you do not want to run it on all the steps, delete the directory Abundance_cal and the file, file_Abundance.tsv in SSU_sequences and the directory Taxonomy.

And Re-run RiboTaxa. Likewise RiboTaxa will recalculates the relative abundances and the taxonomy.

palomo11 commented 1 year ago

It works now. Thanks!