ValueError: too many values to unpack

jenniferlu717 / Bracken

Bracken (Bayesian Reestimation of Abundance with KrakEN) is a highly accurate statistical method that computes the abundance of species in DNA sequences from a metagenomics sample.

http://ccb.jhu.edu/software/bracken/index.shtml

GNU General Public License v3.0

289 stars 50 forks source link

ValueError: too many values to unpack #23

Closed machalita closed 6 years ago

machalita commented 6 years ago

Hi! thanks for taking your time developing this tool. I'm getting the following error while trying to generate the kmer distribution:

python Bracken-master/generate_kmer_distribution.py -i braken/75mers.kraken -o braken/kmer.distribution PROGRAM START TIME: 12-03-2017 22:59:55 Traceback (most recent call last): File "Bracken-master/generate_kmer_distribution.py", line 161, in main() File "Bracken-master/generate_kmer_distribution.py", line 112, in main [genome_taxid, total_kmers, mapped_taxids_kmers] = parse_single_genome(line) File "Bracken-master/generate_kmer_distribution.py", line 78, in parse_single_genome [curr_m_id, curr_kmers] = kmers.split(':') ValueError: too many values to unpack

Ive used braken several times, but this time somehow it isnt working, any ideas?

machalita commented 6 years ago

I located the error. When running in parallel, each process is rushing to write to stdout/file, and the end result is a corrupt file (some lines are missing or incomplete).

xgnusr commented 6 years ago

The same error:

PROGRAM START TIME: 04-23-2018 09:10:04 Traceback (most recent call last): File "est_abundance.py", line 460, in main() File "est_abundance.py", line 270, in main [mapped_taxid, mapped_taxid_dict] = process_kmer_distribution(line,lvl_taxids,map2lvl_taxids) File "est_abundance.py", line 99, in process_kmer_distribution [g_taxid,mkmers,tkmers] = genome_str.split(':') ValueError: too many values to unpack

I do not understand the problem .......

Can help me ?

Very thank's in advance

jenniferlu717 commented 6 years ago

What is the command line you are using to run est_abundance? Can you send the kmer_distribution to jlu26@jhmi.edu?

machalita commented 6 years ago

To avoid that error you must run perl count-kmer-abundances.pl with threads=1 The issue is, when you use more than one thread, each thread will rush to write to the output file, and thus creating a corrupt file.

jenniferlu717 commented 6 years ago

The output file can be manually fixed. It doesn't create a completely corrupt file. In my experience, it tends to just print two kmer distributions to a single line. I can try to fix the file for you if you send it to jlu262jhmi.edu

However, we are working on fixing the script to allow multithreading without creating a corrupt output file.

xgnusr commented 6 years ago

I have been able to fix it in a very simple way: embedded in a bash script works very well, the code used is this: #!/bin/bash DB_NAME=kdb_032018 PATH=$PATH:/linneo/ubioma/kraken_032018 export PATH export KRAKEN_DB_PATH="/linneo/ubioma/kraken_032018/$DB_NAME:" cd /linneo/ubioma/kraken_032018 perl bracken/count-kmer-abundances.pl --db=$DB_NAME \ --threads 15 \ --read-length=160 bracken/database.kraken >bracken/database160mers.kraken_cnts exit 0 Very simple, but it works ...........

fbreitwieser commented 6 years ago

As @machalita said, please use one thread with count-kmer-abundances.pl for now. I'm working on a fix to have each thread write to a separate file and merge the files at the end.

fbreitwieser commented 6 years ago

Should be fixed now - please re-open if the problem persists!

jessmewald commented 2 years ago

Traceback (most recent call last):
File "/home/ewaldj/miniconda3/envs/kraken2/bin/generate_kmer_distribution.py", line 161, in main()
File "/home/ewaldj/miniconda3/envs/kraken2/bin/generate_kmer_distribution.py", line 112, in main [genome_taxid, total_kmers, mapped_taxids_kmers] = parse_single_genome(line)
File "/home/ewaldj/miniconda3/envs/kraken2/bin/generate_kmer_distribution.py", line 78, in parse_single_genome [curr_m_id, curr_kmers] = kmers.split(':') ValueError: too many values to unpack (expected 2)

When I run bracken-build -d /tmp/db/krakendb -l 150 I get the above error. I attempted to modify the number of threads called when running count-kmer-abundances.pl, but I could not find that script in the conda install for either kraken2 or bracken. Perhaps that script is not present in the current releases?

Any suggestions? Thank you!