Open mutantjoo0 opened 4 years ago
What did you run for step0? None of the Bracken steps will work unless you can build the kraken2 database:
kraken2-build --build --db=kraken_standard_db --threads 10
From there, it should generate taxo.k2d, hash.k2d, and opts.k2d. Without all three, Bracken will not work.
Hi Jennifer,
Thank you for your prompt response and support. My command and process used to build standard database are shown below. I am not sure whether this process properly completed or not.
(bracken) -bash-4.2$ kraken2-build --standard --db kraken_standard_db/ --threads 30 --use-ftp
Step 1/2: Performing ftp file transfer of requested files
Step 2/2: Assigning taxonomic IDs to sequences
Processed 390 projects (604 sequences, 1.02 Gbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library... done.
Step 1/2: Performing ftp file transfer of requested files
Step 2/2: Assigning taxonomic IDs to sequences
Processed 20868 projects (45891 sequences, 84.98 Gbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library... done.
Step 1/2: Performing ftp file transfer of requested files
Step 2/2: Assigning taxonomic IDs to sequences
Processed 10379 projects (13002 sequences, 386.51 Mbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library... done.
Step 1/2: Performing ftp file transfer of requested files
Step 2/2: Assigning taxonomic IDs to sequences
Processed 1 project (639 sequences, 3.27 Gbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Downloading UniVec_Core data from server... done.
Adding taxonomy ID of 28384 to all sequences... done.
Masking low-complexity regions of downloaded library... done.
Creating sequence ID to taxonomy ID map (step 1)...
Sequence ID to taxonomy ID map complete. [0.443s]
Estimating required capacity (step 2)...
Estimated hash table requirement: 48971338604 bytes
Capacity estimation complete. [9m48.607s]
Building database files (step 3)...
Taxonomy parsed and converted.
CHT created with 15 bits reserved for taxid.
Processed 2498 sequences (6284101310 bp)...xargs: cat: terminated by signal 13
/mnt/home/leejooy5/miniconda3/envs/bracken/libexec/build_kraken2_db.sh: line 133: 26972 Done list_sequence_files
26973 Exit 125 | xargs -0 cat
26974 Killed | build_db -k $KRAKEN2_KMER_LEN -l $KRAKEN2_MINIMIZER_LEN -S $KRAKEN2_SEED_TEMPLATE $KRAKEN2XFLAG -H hash.k2d.tmp -t taxo.k2d.tmp -o opts.k2d.tmp -n taomy/ -m $seqid2taxid_map_file -c $required_capacity -p $KRAKEN2_THREAD_CT $max_db_flag
Previously I got another error without --use-ftp
option:
(bracken) -bash-4.2$ kraken2-build --standard --db kraken_standard_db --threads 24
Downloading taxonomy tree data...rsync: error while loading shared libraries: libiconv.so.2: cannot open shared object file: No such file or directory
It looks like the database downloaded fine but didnt build fine. You dont need to run that command again but you will need to run:
kraken2-build --build --db kraken_standard_db --threads 24
again (no worries about --use-ftp)
I'm not 100% sure why it broke but how much RAM is in your system?
I tried again and it terminated by signal 13 and I found taxo.k2d.tmp file as shown below.
(bracken) -bash-4.2$ kraken2-build --standard --db standard --threads 24 --use-ftp
Downloading taxonomy tree data... done.
Untarring taxonomy tree data... done.
Step 1/2: Performing ftp file transfer of requested files
Step 2/2: Assigning taxonomic IDs to sequences
Processed 390 projects (604 sequences, 1.02 Gbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library... done.
Step 1/2: Performing ftp file transfer of requested files
Step 2/2: Assigning taxonomic IDs to sequences
Processed 20970 projects (46145 sequences, 85.39 Gbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library... done.
Step 1/2: Performing ftp file transfer of requested files
rsync_from_ncbi.pl: unable to download all/GCF/000/849/405/GCF_000849405.1_ViralProj14717/GCF_000849405.1_ViralProj14717_genomic.fna.gz: Idle timeout (60 seconds): closing control connection
rsync_from_ncbi.pl: unable to download all/GCF/000/915/475/GCF_000915475.1_ViralProj239432/GCF_000915475.1_ViralProj239432_genomic.fna.gz: [Net::FTP] Connection closed
rsync_from_ncbi.pl: unable to download all/GCF/004/128/475/GCF_004128475.1_ASM412847v1/GCF_004128475.1_ASM412847v1_genomic.fna.gz: [Net::FTP] Connection closed
.
.
.
Processed 10388 projects (7511 sequences, 220.28 Mbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Masking low-complexity regions of downloaded library... done.
Step 1/2: Performing ftp file transfer of requested files
Step 2/2: Assigning taxonomic IDs to sequences
Processed 1 project (639 sequences, 3.27 Gbp)... done.
All files processed, cleaning up extra sequence files... done, library complete.
Downloading UniVec_Core data from server... done.
Adding taxonomy ID of 28384 to all sequences... done.
Masking low-complexity regions of downloaded library... done.
Creating sequence ID to taxonomy ID map (step 1)...
Sequence ID to taxonomy ID map complete. [0.245s]
Estimating required capacity (step 2)...
Estimated hash table requirement: 49068412340 bytes
Capacity estimation complete. [9m54.634s]
Building database files (step 3)...
Taxonomy parsed and converted.
CHT created with 15 bits reserved for taxid.
Processed 2858 sequences (7503328991 bp)...xargs: cat: terminated by signal 13
/mnt/home/leejooy5/miniconda3/envs/bracken/libexec/build_kraken2_db.sh: line 133: 25862 Done list_sequence_files
25863 Exit 125 | xargs -0 cat
25864 Killed | build_db -k $KRAKEN2_KMER_LEN -l $KRAKEN2_MINIMIZER_LEN -S $KRAKEN2_SEED_TEMPLATE $KRAKEN2XFLAG -H hash.k2d.tmp -t taxo.k2d.tmp -o opts.k2d.tmp -n taxonomy/ -m $seqid2taxid_map_file -c $required_capacity -p $KRAKEN2_THREAD_CT $max_db_flag
(bracken) -bash-4.2$ ls standard/
library seqid2taxid.map taxo.k2d.tmp taxonomy
I am using hpc dev node which has 377G memory. After running, I checked as follows:
(bracken) -bash-4.2$ free -mh
total used free shared buff/cache available
Mem: 377G 289G 78G 460M 9.8G 86G
Swap: 0B 0B 0B
Should I have to run kraken2-build --build --db kraken_standard_db --threads 24
or start over from kraken2-build --standard --db NAME
?
I think you only need to run kraken2-build --build
but Im not sure if your system has enough memory, which might be causing the error. It might be having some trouble. Can you try building with max-db-size 30000000
?
Hi Jennifer,
kraken2-build --build --db NAME --threads 24
failed as shown below. Now I am running again with --max-db-size 30000000
. I will keep post here once I get results.
(bracken) -bash-4.2$ kraken2-build --build --db kraken_standard_db_failed/ --threads 24
Creating sequence ID to taxonomy ID map (step 1)...
Sequence ID to taxonomy ID map already present, skipping map creation.
Estimating required capacity (step 2)...
Estimated hash table requirement: 48971338604 bytes
Capacity estimation complete. [10m43.179s]
Building database files (step 3)...
Taxonomy parsed and converted.
CHT created with 15 bits reserved for taxid.
Processed 2978 sequences (7343292609 bp)...xargs: cat: terminated by signal 13
/mnt/home/leejooy5/miniconda3/envs/bracken/libexec/build_kraken2_db.sh: line 133: 40223 Done list_sequence_files
40224 Exit 125 | xargs -0 cat
40225 Killed | build_db -k $KRAKEN2_KMER_LEN -l $KRAKEN2_MINIMIZER_LEN -S $KRAKEN2_SEED_TEMPLATE $KRAKEN2XFLAG -H hash.k2d.tmp -t taxo.k2d.tmp -o opts.k2d.tmp -n taxonomy/ -m $seqid2taxid_map_file -c $required_capacity -p $KRAKEN2_THREAD_CT $max_db_flag
(bracken) -bash-4.2$ ls kraken_standard_db_failed/
library seqid2taxid.map taxo.k2d.tmp taxonomy
Hi Jennifer,
Thanks to your support, I could complete kraken database construction.
(bracken) -bash-4.2$ kraken2-build --build --db kraken_standard_db_failed/ --max-db-size 30000000
Creating sequence ID to taxonomy ID map (step 1)...
Sequence ID to taxonomy ID map already present, skipping map creation.
Estimating required capacity (step 2)...
Estimated hash table requirement: 48971338604 bytes
Specifying lower maximum hash table size of 30000000 bytes
Capacity estimation complete. [44m53.800s]
Building database files (step 3)...
Taxonomy parsed and converted.
CHT created with 15 bits reserved for taxid.
Completed processing of 63271 sequences, 89655529216 bp
Writing data to disk... complete.
Database files completed. [48m31.302s]
Database construction complete. [Total: 1h33m25.172s]
(bracken) -bash-4.2$ du -sh kraken_standard_db_failed/
(bracken) -bash-4.2$ ls -lht *standard*
kraken_standard_db:
total 36M
-rw-r----- 1 leejooy5 Reguera_Kashefi_Lab 56 Aug 13 16:58 opts.k2d
-rw-r----- 1 leejooy5 Reguera_Kashefi_Lab 29M Aug 13 16:58 hash.k2d
-rw-r----- 1 leejooy5 Reguera_Kashefi_Lab 2.4M Aug 13 16:09 taxo.k2d
-rw-r----- 1 leejooy5 Reguera_Kashefi_Lab 4.2M Aug 12 04:36 seqid2taxid.map
drwxr-s--- 2 leejooy5 Reguera_Kashefi_Lab 8.0K Aug 12 04:36 taxonomy
drwxr-s--- 7 leejooy5 Reguera_Kashefi_Lab 8.0K Aug 12 04:36 library
(bracken) -bash-4.2$ mv kraken_standard_db_failed/ kraken_standard_db/
I found kraken2-build --clean --db NAME
remove library directory which is required to process bracken-build
. Fortunately, I made a backup of db and started running bracken-build
. I will post the summary of result and time required to bracken-build
.
Thanks again for your great support.
Stay safe and healthy, Joo-Young
Running bracken-build
took 10 min and it was successful.
(bracken) -bash-4.2$ bracken-build -d standard_db -t 16
>> Selected Options:
kmer length = 35
read length = 100
database = standard_db
threads = 16
>> Checking for Valid Options...
>> Creating database.kraken [if not found]
>> kraken2 --db standard_db --threads 16 <( find -L standard_db/library \( -name *.fna -o -name *.fa -o -name *.fasta \) -exec cat {} + ) > standard_db/database.kraken
Loading database information... done.
63273 sequences (89655.55 Mbp) processed in 284.993s (13.3 Kseq/m, 18875.35 Mbp/m).
46160 sequences classified (72.95%)
17113 sequences unclassified (27.05%)
Finished creating database.kraken [in DB folder]
>> Creating database100mers.kmer_distrib
>>STEP 0: PARSING COMMAND LINE ARGUMENTS
Taxonomy nodes file: standard_db/taxonomy/nodes.dmp
Seqid file: standard_db/seqid2taxid.map
Num Threads: 16
Kmer Length: 35
Read Length: 100
>>STEP 1: READING SEQID2TAXID MAP
109763 total sequences read
>>STEP 2: READING NODES.DMP FILE
2266594 total nodes read
>>STEP 3: CONVERTING KMER MAPPINGS INTO READ CLASSIFICATIONS:
100mers, with a database built using 35mers
63290 sequences converted (finished: kraken:taxid|1269028|NC_020104.1)1)1.1)25)59)))49)))
Time Elaped: 2 minutes, 34 seconds, 0.00000 microseconds
=============================
PROGRAM START TIME: 08-13-2020 21:40:21
...19879 total genomes read from kraken output file
...creating kmer counts file -- lists the number of kmers of each classification per genome
...creating kmer distribution file -- lists genomes and kmer counts contributing to each genome
PROGRAM END TIME: 08-13-2020 21:40:22
Finished creating database100mers.kraken and database100mers.kmer_distrib [in DB folder]
*NOTE: to create read distribution files for multiple read lengths,
rerun this script specifying the same database but a different read length
Bracken build complete.
(bracken) -bash-4.2$ ls -lht standard_db/
total 738M
-rw-r----- 1 leejooy5 Reguera_Kashefi_Lab 2.2M Aug 13 17:40 database100mers.kmer_distrib
-rw-r----- 1 leejooy5 Reguera_Kashefi_Lab 5.3M Aug 13 17:40 database100mers.kraken
-rw-r----- 1 leejooy5 Reguera_Kashefi_Lab 695M Aug 13 17:37 database.kraken
drwxr-s--- 2 leejooy5 Reguera_Kashefi_Lab 8.0K Aug 13 17:15 taxonomy
drwxr-s--- 7 leejooy5 Reguera_Kashefi_Lab 8.0K Aug 13 17:15 library
-rw-r----- 1 leejooy5 Reguera_Kashefi_Lab 29M Aug 13 17:14 hash.k2d
-rw-r----- 1 leejooy5 Reguera_Kashefi_Lab 2.4M Aug 13 17:14 taxo.k2d
-rw-r----- 1 leejooy5 Reguera_Kashefi_Lab 4.2M Aug 13 17:14 seqid2taxid.map
-rw-r----- 1 leejooy5 Reguera_Kashefi_Lab 56 Aug 13 17:14 opts.k2d
Thanks!
Hello Jennifer (@jenniferlu717),
I am new to Bracken and Kraken. I installed them on conda environment as a new environment as follows:
Following the manual, I did step 0: build a kraken database-standard and bacteria. The followings are outputs in step 0.
One of my concern when I was doing step0 was I got a sort of warning
terminated by signal 13
, both in kraken standard database and kraken bacteria database.When I was trying to do step1:bracken-build, I got these errors:
I also tried to use copied kraken2-build script in work directory, but it did not work.
Alternatively, I tried to follow step#1a-c, but it gave me another error:
Collectively I assumed that I should have taxo.k2d instead of taxo.k2d.tmp,which seems like temporary file, from the step0. The problem is I repeatedly getting same results: processing step0 completed saying
xargs: cat: terminated by signal 13
and resulted taxo.k2d.tmp in output/database directory. Could you help me to fix this issue?Thanks, Joo-Young