voutcn / megahit

Ultra-fast and memory-efficient (meta-)genome assembler
http://www.ncbi.nlm.nih.gov/pubmed/25609793
GNU General Public License v3.0
573 stars 134 forks source link

Exit code 6 when assembling kmers #331

Open hbbshulman opened 2 years ago

hbbshulman commented 2 years ago

Hello,

I am currently running into an issue when assembling contigs for k=21. This is for an assembly of 172 GB of data using 900GB of memory and 10 CPUs. I have pasted the log output below. I compared my issue to https://github.com/voutcn/megahit/issues/146. Is this a memory issue that needs to be solved using BBnorm? I have also attached a screenshot of the files in the k21 folder below, which don't (to my limited knowledge) appear that large. Thank you for any insight!

MEGAHIT v1.1.3 --- [Mon Apr 18 13:15:27 2022] Start assembly. Number of CPU threads 10 --- --- [Mon Apr 18 13:15:27 2022] Available memory: 1081637261312, used: 900000000000 --- [Mon Apr 18 13:15:27 2022] Converting reads to binaries --- /bigdata/aronsonlab/hshul001/.conda/envs/metawrap-env/bin/megahit_asm_core buildlib ASSEMBLY_BC/megahit.tmp/megahit_tmp_SH6KZF/reads.lib ASSEMBLY_BC/megahit.tmp/megahit_tmp_SH6KZF/reads.lib [read_lib_functions-inl.h : 209] Lib 0 (clean_reads/BC/ALL_READS_BC_1.fastq,clean_reads/BC/ALL_READS_BC_2.fastq): pe, 467482504 reads, 151 max length [utils.h : 126] Real: 632.9277 user: 489.3902 sys: 77.4643 maxrss: 165684 --- [Mon Apr 18 13:26:00 2022] k list: 21,29,39,59,79,99,119,141 --- --- [Mon Apr 18 13:26:00 2022] Extracting solid (k+1)-mers for k = 21 --- cmd: /bigdata/aronsonlab/hshul001/.conda/envs/metawrap-env/bin/megahit_sdbg_build count -k 21 -m 2 --host_mem 900000000000 --mem_flag 1 --gpu_mem 0 --output_prefix ASSEMBLY_BC/megahit.tmp/megahit_tmp_SH6KZF/k21/21 --num_cpu_threads 10 --num_output_threads 3 --read_lib_file ASSEMBLY_BC/megahit.tmp/megahit_tmp_SH6KZF/reads.lib [sdbg_builder.cpp : 112] Host memory to be used: 900000000000 [sdbg_builder.cpp : 113] Number CPU threads: 10 [cx1.h : 450] Preparing data... [read_lib_functions-inl.h : 256] Before reading, sizeof seq_package: 21300005796 [read_lib_functions-inl.h : 260] After reading, sizeof seq_package: 21300005796 [cx1_kmer_count.cpp : 136] 467482504 reads, 151 max read length [cx1.h : 457] Preparing data... Done. Time elapsed: 87.6848 [cx1.h : 464] Preparing partitions and initialing global data... [cx1_kmer_count.cpp : 227] 2 words per substring, 2 words per edge [cx1_kmer_count.cpp : 322] Set: 33105247344, 877189011636 [cx1_kmer_count.cpp : 356] 8057280186, 36505275 33105247344 877189011636 [cx1_kmer_count.cpp : 363] Memory for reads: 21848760380 [cx1_kmer_count.cpp : 364] max # lv.1 items = 8057280186 [cx1.h : 480] Preparing partitions and initialing global data... Done. Time elapsed: 154.5767 [cx1.h : 486] Start main loop... [cx1.h : 515] Lv1 scanning from bucket 0 to 439 [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 175.6694 [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 361.7007 [cx1.h : 515] Lv1 scanning from bucket 439 to 1398 [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 182.5338 [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 413.7580 [cx1.h : 515] Lv1 scanning from bucket 1398 to 2922 [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 185.9310 [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 419.1161 [cx1.h : 515] Lv1 scanning from bucket 2922 to 5234 [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 189.4740 [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 425.3257 [cx1.h : 515] Lv1 scanning from bucket 5234 to 8810 [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 191.7771 [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 430.2364 [cx1.h : 515] Lv1 scanning from bucket 8810 to 14759 [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 195.3766 [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 439.1521 [cx1.h : 515] Lv1 scanning from bucket 14759 to 27662 [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 200.2535 [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 434.8503 [cx1.h : 515] Lv1 scanning from bucket 27662 to 65536 [cx1.h : 528] Lv1 scanning done. Large diff: 49. Time elapsed: 194.9733 [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 286.9522 [cx1.h : 607] Main loop done. Time elapsed: 4727.0816 [cx1.h : 613] Postprocessing... [cx1_kmer_count.cpp : 860] Total number of candidate reads: 35484194(146155594) [cx1_kmer_count.cpp : 871] Total number of solid edges: 8709234460 [cx1.h : 621] Postprocess done. Time elapsed: 14.2407 [utils.h : 126] Real: 4983.6361 user: 48307.5821 sys: 163.5017 maxrss: 57349848 --- [Mon Apr 18 14:49:04 2022] Building graph for k = 21 --- /bigdata/aronsonlab/hshul001/.conda/envs/metawrap-env/bin/megahit_sdbg_build seq2sdbg --host_mem 900000000000 --mem_flag 1 --gpu_mem 0 --output_prefix ASSEMBLY_BC/megahit.tmp/megahit_tmp_SH6KZF/k21/21 --num_cpu_threads 10 -k 21 --kmer_from 0 --num_edge_files 3 --input_prefix ASSEMBLY_BC/megahit.tmp/megahit_tmp_SH6KZF/k21/21 --need_mercy [sdbg_builder.cpp : 339] Host memory to be used: 900000000000 [sdbg_builder.cpp : 340] Number CPU threads: 10 [cx1.h : 450] Preparing data... [cx1_seq2sdbg.cpp : 394] Number edges: 8709234460 [cx1_seq2sdbg.cpp : 434] Bases to reserve: 239503947650, number contigs: 0, number multiplicity: 10886543075 [cx1_seq2sdbg.cpp : 440] Before reading, sizeof seq_package: 59875986924, multiplicity vector: 10886543075 [cx1_seq2sdbg.cpp : 455] Adding mercy edges... [cx1_seq2sdbg.cpp : 373] Number of reads: 35484194, Number of mercy edges: 159888388 [cx1_seq2sdbg.cpp : 462] Done. Time elapsed: 1611.5370 [cx1_seq2sdbg.cpp : 529] After reading, sizeof seq_package: 59875986924, multiplicity vector: 10886543075 [cx1.h : 457] Preparing data... Done. Time elapsed: 3511.7439 [cx1.h : 464] Preparing partitions and initialing global data... [cx1_seq2sdbg.cpp : 740] Memory for sequence: 79138613140 [cx1_seq2sdbg.cpp : 741] max # lv.1 items = 7095298278 [cx1.h : 480] Preparing partitions and initialing global data... Done. Time elapsed: 90.1103 [cx1.h : 486] Start main loop... [cx1.h : 515] Lv1 scanning from bucket 0 to 888 [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 101.5403 [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 203.1598 [cx1.h : 515] Lv1 scanning from bucket 888 to 2595 [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 104.0996 [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 213.7796 [cx1.h : 515] Lv1 scanning from bucket 2595 to 5223 [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 105.0172 [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 1392.5419 [cx1.h : 515] Lv1 scanning from bucket 5223 to 9127 [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 106.3747 [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 2603.4234 [cx1.h : 515] Lv1 scanning from bucket 9127 to 14959 [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 107.8583 [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 2681.2752 [cx1.h : 515] Lv1 scanning from bucket 14959 to 24433 [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 109.5040 [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 2611.6156 [cx1.h : 515] Lv1 scanning from bucket 24433 to 42507 [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 113.3471 [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 2653.8034 [cx1.h : 515] Lv1 scanning from bucket 42507 to 65536 [cx1.h : 528] Lv1 scanning done. Large diff: 0. Time elapsed: 102.3255 [cx1.h : 594] Lv1 fetching & sorting done. Time elapsed: 1503.6664 [cx1.h : 607] Main loop done. Time elapsed: 14713.3331 [cx1.h : 613] Postprocessing... [cx1_seq2sdbg.cpp :1139] Number of $ A C G T A- C- G- T-: [cx1_seq2sdbg.cpp :1142] 133256801 2881212112 5410228190 5371937656 2915483016 153200374 490844038 501566313 146976875 [cx1_seq2sdbg.cpp :1151] Total number of edges: 18004705375 [cx1_seq2sdbg.cpp :1152] Total number of ONEs: 16578860974 [cx1_seq2sdbg.cpp :1153] Total number of $v edges: 133256801 [cx1.h : 621] Postprocess done. Time elapsed: 1.3587 [utils.h : 126] Real: 18316.7026 user: 34843.2326 sys: 298.4356 maxrss: 93153508 --- [Mon Apr 18 19:54:26 2022] Assembling contigs from SdBG for k = 21 --- cmd: /bigdata/aronsonlab/hshul001/.conda/envs/metawrap-env/bin/megahit_asm_core assemble -s ASSEMBLY_BC/megahit.tmp/megahit_tmp_SH6KZF/k21/21 -o ASSEMBLY_BC/megahit/intermediate_contigs/k21 -t 10 --min_standalone 300.0 --prune_level 2 --merge_len 20 --merge_similar 0.95 --low_local_ratio 0.2 --min_depth 2 --bubble_level 2 --max_tip_len -1 --careful_bubble [assembler.cpp : 148] Loading succinct de Bruijn graph: ASSEMBLY_BC/megahit.tmp/megahit_tmp_SH6KZF/k21/21 megahit_asm_core: sdbg_multi_io.h:243: void SdbgReader::read_info(): Assertion `fscanf(sdbg_info, "k %d\n", &kmersize) == 1' failed. Error occurs when assembling contigs for k = 21, please refer to ASSEMBLY_BC/megahit/log for detail [Exit code -6] MEGAHIT v1.1.3

Screen Shot 2022-04-19 at 2 26 47 PM
mahesh1368569 commented 1 year ago

Hey, I am getting same error to my data. Did you resolve it?