jenniferlu717 / Bracken

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
http://ccb.jhu.edu/software/bracken/index.shtml
GNU General Public License v3.0
287 stars 50 forks source link

`Error: no reads found` but the report file contains read counts #153

Open MatteoSchiavinato opened 3 years ago

MatteoSchiavinato commented 3 years ago

I'm using bracken in a pipeline, and I'm encountering some issues. I have used it before in the same system without any issue, so I'm a bit confused about this.

I'm running it on a report file that contains read counts produced by kraken2. It is not an mpa report, so it's the correct version.

With the following command:

python3.8 \
est_abundance.py \
-i test.report \
-o test.bracken \
-k database100mers.kmer_distrib \
-l S \
-t 0  

I'm getting the following error:

>> Checking report file: test.report
Error: no reads found. Please check your Kraken report

However the report contains several taxa each of which has counts. So I don't understand what the issue is.

I tried to modify the est_abundance.py script to debug it and I found that there are a few commented lines that switch certain commands (e.g. new_all_reads = float(all_reads) + float(added_reads)) with newer versions (e.g. new_all_reads = float(added_reads)).

If I restore the commented commands, removing their new versions, the program works but the output contains 0 reads for each taxa (each read count for each taxa gets subtracted all of its reads, becoming 0).

Could this be a situation where bracken could not estimate the read distribution, and hence no read was added or subtracted? And if so, can it handle such scenario or does it end up in a ZeroDivisionError or something like that?

maxlcummins commented 3 years ago

I have also had this issue. I have several hundred kraken reports I am processing, one of which routinely fails to be processed by Bracken.

Does your kraken report have a similar structure as mine? Note the F, F1, F2 and S rows. I think it is happening because there is no genus designation and it hops from Family to species with no in between - I think genus is required for the species level for some reason.

If i change the level flag from "S" (species) to "G" (genus), it still doesnt work (Genus is missing and maybe Species requires checking the genus designation first?), however if i change the level flag to "F" (Family) then i get a report successfully generated.

The error message could definitely be a bit more descriptive but I think its because (at least in my case) there is no genus designation which for some reason breaks the species level (and also obviously the genus level).

 2.08   1   1   U   0   unclassified
 97.92  47  0   R   1   root
 97.92  47  0   R1  131567    cellular organisms
 97.92  47  0   D   2       Bacteria
 97.92  47  0   P   1224          Proteobacteria
 97.92  47  0   C   1236            Gammaproteobacteria
 97.92  47  0   O   91347             Enterobacterales
 97.92  47  3   F   543             Enterobacteriaceae
 79.17  38  0   F1  191675                unclassified Enterobacteriaceae
 79.17  38  0   F2  36866                   unclassified Enterobacteriaceae (miscellaneous)
 79.17  38  38  S   693444                    Enterobacteriaceae bacterium strain FGI 57
  4.17  2   0   G   561               Escherichia
  4.17  2   2   S   562                 Escherichia coli
  2.08  1   0   G   547               Enterobacter
  2.08  1   1   S   1692238                 Enterobacter sp. FY-07
  2.08  1   0   G   590               Salmonella
  2.08  1   1   S   28901                   Salmonella enterica
  2.08  1   0   G   570               Klebsiella
  2.08  1   0   S   573                 Klebsiella pneumoniae
  2.08  1   1   S1  72407                     Klebsiella pneumoniae subsp. pneumoniae
  2.08  1   0   G   544               Citrobacter
  2.08  1   0   G1  1344959                 Citrobacter freundii complex
  2.08  1   1   S   546                   Citrobacter freundii