jenniferlu717 / Bracken

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.
http://ccb.jhu.edu/software/bracken/index.shtml
GNU General Public License v3.0
289 stars 50 forks source link

The newest version should now work with krakenuniq databases. #203

Open phlatphish opened 2 years ago

phlatphish commented 2 years ago

There is still something funny going on. I have the latest bracken build in ~/build/Bracken and I have the executables as symlinks in ~/bin. They are in the path. For krakenuniq I am using a conda package version 0.6. When I had kraken and kraken2 installed with conda alongside krakenuniq I got this error:

>bracken-build -k 35 -l 75 -d /data/cjb/db_052422 -x /home/cjb/miniconda3/envs/biokraken/bin -y krakenuniq -t 32
 >> Selected Options:
       kmer length = 35
       read length = 75
       database    = /data/cjb/db_052422
       threads     = 32
       kraken type = krakenuniq
 >> Checking for Valid Options...
 ERROR: Kraken2 Database incomplete: /data/cjb/db_052422/hash.k2d does not exist

When I uninstalled kraken and kraken2 leaving only krakenuniq, bracken-build started working:

> bracken-build -k 35 -l 75 -d /data/cjb/db_052422 -x /home/cjb/miniconda3/envs/biokraken/bin -y krakenuniq -t 32     >> Selected Options:
       kmer length = 35
       read length = 75
       database    = /data/cjb/db_052422
       threads     = 32
       kraken type = krakenuniq
 >> Checking for Valid Options...
 >> Creating database.kraken [if not found]
          database.kraken.tsv exists, skipping creation....
          Finished creating database.kraken [in DB folder]
 >> Creating database75mers.kmer_distrib
 >>STEP 0: PARSING COMMAND LINE ARGUMENTS
  Taxonomy nodes file: /data/cjb/db_052422/taxonomy/nodes.dmp
  Seqid file:          /data/cjb/db_052422/seqid2taxid.map
  Num Threads:         32
  Kmer Length:         35
  Read Length:         75
 >>STEP 1: READING SEQID2TAXID MAP
  64663 total sequences read
 >>STEP 2: READING NODES.DMP FILE
  2422769 total nodes read
 >>STEP 3: CONVERTING KMER MAPPINGS INTO READ CLASSIFICATIONS:
  75mers, with a database built using 35mers

etc.

That's as far as I have gotten so far.

phlatphish commented 2 years ago

Subsequently, bracken ran cleanly using that database on krakenuniq reports.

fengyuchengdu commented 2 years ago

I'm using Krakenuniq with the pre-built database downloaded from https://benlangmead.github.io/aws-indexes/k2, the 384G one labelled as EuPathDB48, to generate the report file.

Bracken was installed by the install shell rather than via conda

when I ran the program, it said

bracken -d ~/krakenuniq -i new_report.tsv -o new_bracken -w new_bracken_report -r 50 -l S -t 0

Checking for Valid Options... Running Bracken python src/est_abundance.py -i new_report.tsv -o new_bracken -k /hdd1/home/f22_yfeng/krakenuniq/database50mers.kmer_distrib -l S -t 0 PROGRAM START TIME: 10-13-2022 11:57:37 Checking report file: new_report.tsv Traceback (most recent call last): File "/hdd1/home/f22_yfeng/Bracken/src/est_abundance.py", line 554, in main() File "/hdd1/home/f22_yfeng/Bracken/src/est_abundance.py", line 339, in main [mapped_taxid, mapped_taxid_dict] = process_kmer_distribution(line,lvl_taxids,map2lvl_taxids) File "/hdd1/home/f22_yfeng/Bracken/src/est_abundance.py", line 100, in process_kmer_distribution [g_taxid,mkmers,tkmers] = genome_str.split(':') ValueError: not enough values to unpack (expected 3, got 1)

I'm wondering how can I fix this ?

Many thanks

jenniferlu717 commented 2 years ago

@phlatphish Can you try running without the -x flag?

Otherwise, i did fix the script. I accidentally left kraken2 as the default - ignoring the -y flag - when specifying -x. But the newest version does fix this

jenniferlu717 commented 2 years ago

@fengyuchengdu can you open a new issue? I think your kmer_distribution file might be wrong. I need to see the kmer_distribution file you downloaded/are you using