lh3 / bfc

High-performance error correction for Illumina resequencing data
MIT License
68 stars 13 forks source link

Seg fault/double free or corruption #1

Closed aihardin closed 9 years ago

aihardin commented 9 years ago

Hi Heng, I saw the arXiv preprint and and trying out bfc but I'm getting an odd error. The input is adapter trimmed <125 HiSeq reads and for the same file, I will alternately get either a seq fault or \ Error in `bfc': double free or corruption (out).

command: bfc -s 2.9g -k 55 -t 31 smaller.fq > test_corrected.fq

compiled with gcc version 4.8.2. 10,000 read test fq file is here: https://www.dropbox.com/s/67d24d9or15e5wj/smaller.fq?dl=0

lh3 commented 9 years ago

If your genome is 2.9Gb, you would need hundreds of millions of reads to correct errors. I guess the large genome size and tiny input file triggers some unexpected behavior. This is still a bug. I will look into that. Thanks.

peterdfields commented 9 years ago

+1

aihardin commented 9 years ago

Yes, of course my data set is more than 10k reads, that is just a test set. I can't get it to work on my full data set either though. bcf crashed after reading ~809K out of 147M on my 128GB server bfc: malloc.c:2842: mremap_chunk: Assertion `((size + offset) & (_rtld_global_ro._dl_pagesize - 1)) == 0' failed. Aborted (core dumped) If it would be helpful, I could give you access to that file as well. Thanks for looking into the problem!

lh3 commented 9 years ago

I have just fixed it via 14dbe0f. This is an out-of-boundary bug only showing to read lengths smaller but close to 64bp, 128bp, 256bp, etc. My previous guess was wrong. Many thanks for this example.

I am closing the issue. You can reopen it if the latest master branch does not work.

lh3 commented 9 years ago

I should add that bfc -s 2.9g -k 55 -t 31 smaller.fq should also work. The segfault is not caused by -s 2.9g.