jenniferlu717 / Bracken

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
http://ccb.jhu.edu/software/bracken/index.shtml
GNU General Public License v3.0
294 stars 50 forks source link

`IndexError: string index out of range` for clade Bacteria with empty level #284

Open fluhus opened 1 month ago

fluhus commented 1 month ago

Hi,

Thank you for making this tool! :)

I am running Bracken 2.9 on Kraken2's report (small subsample attached below), and I am getting an out of range error. It seems to come from an empty level value for Bacteria.

Command:

$ ./bracken -d .../kraken2/standard -i example.txt -o a -r 100

Output:

 >> Checking for Valid Options...
 >> Running Bracken 
      >> python src/est_abundance.py -i example.txt -o a -k .../kraken2/standard/database100mers.kmer_distrib -l S -t 0
PROGRAM START TIME: 10-20-2024 04:04:30
>> Checking report file: example.txt
Traceback (most recent call last):
  File ".../Bracken-2.9/./src/est_abundance.py", line 557, in <module>
    main()
  File ".../Bracken-2.9/./src/est_abundance.py", line 326, in main
    elif main_lvls.index(level_id[0]) >= branch_lvl:
                         ~~~~~~~~^^^
IndexError: string index out of range

I am not sure if this is a bug in Kraken2 or with Bracken. I noticed that Bacteria, Archaea and Viruses all have empty levels.

Regardless, perhaps level_id can be checked for validity before subscripting, allowing for a more informative error message.

Thanks!

example.txt

fluhus commented 1 month ago

Thank you, @SMUZhanLi!

I got the DB you linked to and I am still getting empty levels for Bacteria, Archaea and Viruses. I tried with both kraken2 v2.1.3 and with the current master branch.

Here is my kraken command:

kraken2 --threads 4 --db standard --output - --report $fout --paired $fin1 $fin2

If the problem is in Kraken, I guess the relevant issue for Bracken would be to check input validity and stop with an informative error if invalid.

fluhus commented 1 month ago

I edited the report file manually and added a 'D' level to Bacteria, Archaea, Viruses and Eukaryotes.

Now I am getting this:

 >> Checking for Valid Options...
 >> Running Bracken 
      >> python src/est_abundance.py -i example.txt -o a -k /dfs7/whitesonlab/faracig/databases/kraken_db_standard/database100mers.kmer_distrib -l S -t 0
PROGRAM START TIME: 10-21-2024 22:43:39
>> Checking report file: example.txt
Traceback (most recent call last):
  File "/dfs7/whitesonlab/alavon/Tools/bracken/./src/est_abundance.py", line 563, in <module>
    main()
  File "/dfs7/whitesonlab/alavon/Tools/bracken/./src/est_abundance.py", line 299, in main
    while level_num != (prev_node.level_num + 1):
                        ^^^^^^^^^^^^^^^^^^^
AttributeError: 'NoneType' object has no attribute 'level_num'