voutcn / megahit

Ultra-fast and memory-efficient (meta-)genome assembler
http://www.ncbi.nlm.nih.gov/pubmed/25609793
GNU General Public License v3.0
588 stars 134 forks source link

Segmentation fault megahit_core_popcnt version 1.2.8. #244

Closed vcepeda closed 4 years ago

vcepeda commented 4 years ago

I'm running many assemblies and only some of them failed. The parameters are "--min-count 3 --min-contig-len 1 --presets meta-sensitive" The log file:

2019-10-07 14:48:41 - MEGAHIT v1.2.8 2019-10-07 14:48:41 - Maximum number of available CPU thread is 16. 2019-10-07 14:48:41 - Number of thread is reset to the 16. 2019-10-07 14:48:41 - Using megahit_core with POPCNT support 2019-10-07 14:48:41 - Convert reads to binary library 2019-10-07 14:50:06 - b'INFO sequence/io/sequence_lib.cpp : 77 - Lib 0 (/fs/cbcb-data/hmp_reads/hmp2/SRS077294_refselk/error_correction/mc.sam.unmapped.1.fq,/fs/cbcb-data/hmp_reads/hmp2/SRS077294_refselk/error_correction/mc.sam.unmapped.2.fq): pe, 34029748 reads, 100 max length' 2019-10-07 14:50:07 - b'INFO utils/utils.h : 152 - Real: 85.2441\tuser: 28.3160\tsys: 5.7017\tmaxrss: 277992' 2019-10-07 14:50:07 - k-max reset to: 119 2019-10-07 14:50:07 - Start assembly. Number of CPU threads 16 2019-10-07 14:50:07 - k list: 21,29,39,49,59,69,79,89,99,109,119 2019-10-07 14:50:07 - Memory used: 34003574784 2019-10-07 14:50:07 - Extracting solid (k+1)-mers and building sdbg for k = 21 2019-10-07 14:50:21 - Error occurs, please refer to /fs/cbcb-data/hmp_reads/hmp2/SRS077294_refselk/assembly/megahit/log for detail 2019-10-07 14:50:21 - Command: /cbcbhomes/vcepeda/MEGAHIT-1.2.8-Linux-x86_64-static/bin/megahit_core_popcnt read2sdbg -k 21 -m 1 --host_mem 34003574784 --mem_flag 1 --output_prefix /fs/cbcb-data/hmp_reads/hmp2/SRS077294_refselk/assembly/megahit/tmp/k21/21 --num_cpu_threads 16 --read_lib_file /fs/cbcb-data/hmp_reads/hmp2/SRS077294_refselk/assembly/megahit/tmp/reads.lib; Exit code -11

voutcn commented 4 years ago

Hi @vcepeda,

Could you run the command with catchsegv and show me the log? That's

catchsegv /cbcbhomes/vcepeda/MEGAHIT-1.2.8-Linux-x86_64-static/bin/megahit_core_popcnt read2sdbg -k 21 -m 1 --host_mem 34003574784 --mem_flag 1 --output_prefix /fs/cbcb-data/hmp_reads/hmp2/SRS077294_refselk/assembly/megahit/tmp/k21/21 --num_cpu_threads 16 --read_lib_file /fs/cbcb-data/hmp_reads/hmp2/SRS077294_refselk/assembly/megahit/tmp/reads.lib

Thanks.

vcepeda commented 4 years ago

I found the problem. I thought the reads were clean but most of the files contain only Ns. After filtering out those reads megahit ran successfully. Is megahit making any assumption about the N content in the input?

voutcn commented 4 years ago

Thank you @vcepeda I was able to reproduce the error when adding sequences with only Ns. Will work on a patch soon.

voutcn commented 4 years ago

Fixed in https://github.com/voutcn/megahit/releases/tag/v1.2.9