Closed machalita closed 6 years ago
I located the error. When running in parallel, each process is rushing to write to stdout/file, and the end result is a corrupt file (some lines are missing or incomplete).
The same error:
PROGRAM START TIME: 04-23-2018 09:10:04
Traceback (most recent call last):
File "est_abundance.py", line 460, in
I do not understand the problem .......
Can help me ?
Very thank's in advance
What is the command line you are using to run est_abundance? Can you send the kmer_distribution to jlu26@jhmi.edu?
To avoid that error you must run perl count-kmer-abundances.pl with threads=1 The issue is, when you use more than one thread, each thread will rush to write to the output file, and thus creating a corrupt file.
The output file can be manually fixed. It doesn't create a completely corrupt file. In my experience, it tends to just print two kmer distributions to a single line. I can try to fix the file for you if you send it to jlu262jhmi.edu
However, we are working on fixing the script to allow multithreading without creating a corrupt output file.
I have been able to fix it in a very simple way: embedded in a bash script works very well, the code used is this: #!/bin/bash DB_NAME=kdb_032018 PATH=$PATH:/linneo/ubioma/kraken_032018 export PATH export KRAKEN_DB_PATH="/linneo/ubioma/kraken_032018/$DB_NAME:" cd /linneo/ubioma/kraken_032018 perl bracken/count-kmer-abundances.pl --db=$DB_NAME \ --threads 15 \ --read-length=160 bracken/database.kraken >bracken/database160mers.kraken_cnts exit 0 Very simple, but it works ...........
(I need to learn html to edit correctly) :(As @machalita said, please use one thread with count-kmer-abundances.pl
for now. I'm working on a fix to have each thread write to a separate file and merge the files at the end.
Should be fixed now - please re-open if the problem persists!
Traceback (most recent call last):
File "/home/ewaldj/miniconda3/envs/kraken2/bin/generate_kmer_distribution.py", line 161, in
File "/home/ewaldj/miniconda3/envs/kraken2/bin/generate_kmer_distribution.py", line 112, in main [genome_taxid, total_kmers, mapped_taxids_kmers] = parse_single_genome(line)
File "/home/ewaldj/miniconda3/envs/kraken2/bin/generate_kmer_distribution.py", line 78, in parse_single_genome [curr_m_id, curr_kmers] = kmers.split(':')
ValueError: too many values to unpack (expected 2)
When I run bracken-build -d /tmp/db/krakendb -l 150
I get the above error. I attempted to modify the number of threads called when running count-kmer-abundances.pl
, but I could not find that script in the conda install for either kraken2 or bracken. Perhaps that script is not present in the current releases?
Any suggestions? Thank you!
Hi! thanks for taking your time developing this tool. I'm getting the following error while trying to generate the kmer distribution:
python Bracken-master/generate_kmer_distribution.py -i braken/75mers.kraken -o braken/kmer.distribution PROGRAM START TIME: 12-03-2017 22:59:55 Traceback (most recent call last): File "Bracken-master/generate_kmer_distribution.py", line 161, in
main()
File "Bracken-master/generate_kmer_distribution.py", line 112, in main
[genome_taxid, total_kmers, mapped_taxids_kmers] = parse_single_genome(line)
File "Bracken-master/generate_kmer_distribution.py", line 78, in parse_single_genome
[curr_m_id, curr_kmers] = kmers.split(':')
ValueError: too many values to unpack
Ive used braken several times, but this time somehow it isnt working, any ideas?