torognes / swarm

A robust and fast clustering method for amplicon-based studies
GNU Affero General Public License v3.0
123 stars 23 forks source link

memory-allocation error in swarm 3 fastidious #123

Closed frederic-mahe closed 5 years ago

frederic-mahe commented 5 years ago

when compiling swarm 3 with address sanitizer (see issue #120), a buffer-overflow is detected but it the error log is not very clear to me (see below). It could be an error in a shared-library.

swarm -f -o /dev/null <(printf ">s1_10\nAA\n>s2_1\nCC\n")

error message is:

Swarm 3.0.0 [Jan 18 2019 14:33:32]
Copyright (C) 2012-2019 Torbjorn Rognes and Frederic Mahe
https://github.com/torognes/swarm

Mahe F, Rognes T, Quince C, de Vargas C, Dunthorn M (2014)
Swarm: robust and fast clustering method for amplicon-based studies
PeerJ 2:e593 https://doi.org/10.7717/peerj.593

Mahe F, Rognes T, Quince C, de Vargas C, Dunthorn M (2015)
Swarm v2: highly-scalable and high-resolution amplicon clustering
PeerJ 3:e1420 https://doi.org/10.7717/peerj.1420

CPU features:      mmx sse sse2 sse3 ssse3 sse4.1 sse4.2 popcnt avx avx2
Database file:     ./test_files/tiny.fas
Output file:       /dev/null
Resolution (d):    1
Threads:           1
Break OTUs:        Yes
Fastidious:        Yes, with boundary 3

Reading sequences: 0%  ^MReading sequences: 84%  ^MReading sequences: 100%  ^MReading sequences: 100%
Indexing database: 0%  ^MIndexing database: 50%  ^MIndexing database: 100%
Database info:     4 nt in 2 sequences, longest 2 nt
Hashing sequences: 0%  ^MHashing sequences: 50%  ^MHashing sequences: 100%
Building network:  0%  ^MBuilding network:  50%  ^MBuilding network:  100%
Clustering:        0%  ^MClustering:        50%  ^MClustering:        100%  ^MClustering:        100%

Results before fastidious processing:
Number of swarms:  2
Largest swarm:     1

Counting amplicons in heavy and light swarms 0%  ^MCounting amplicons in heavy and light swarms 50%  ^MCounting amplicons in heavy and light swarms 100%  ^MCounting amplicons in heavy and light swarms 100%
Heavy swarms: 1, with 1 amplicons
Light swarms: 1, with 1 amplicons
Total length of amplicons in light swarms: 2
Bloom filter: bits=16, m=224, k=6, size=0.0MB
Adding light swarm amplicons to Bloom filter 0%  ^MAdding light swarm amplicons to Bloom filter 100%  ^MAdding light swarm amplicons to Bloom filter 100%
Generated 17 variants from light swarms
Checking heavy swarm amplicons against Bloom filter 0%  ^MChecking heavy swarm amplicons against Bloom filter 100%=================================================================
==27114==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6030000018a4 at pc 0x7f219b1fb77a bp 0x7f2196efbc00 sp 0x7f2196efb3a8
WRITE of size 8 at 0x6030000018a4 thread T3
    #0 0x7f219b1fb779  (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x79779)
    #1 0x55dfab7ecd34 in memcpy /usr/include/x86_64-linux-gnu/bits/string_fortified.h:34
    #2 0x55dfab7ecd34 in generate_variant_sequence /home/ubuntu/swarm3_fuzzing/swarm/src/variants.cc:79
    #3 0x55dfab7ecd34 in check_heavy_var /home/ubuntu/swarm3_fuzzing/swarm/src/algod1.cc:366
    #4 0x55dfab7ecd34 in check_heavy_thread /home/ubuntu/swarm3_fuzzing/swarm/src/algod1.cc:401
    #5 0x55dfab7e0f69 in worker /home/ubuntu/swarm3_fuzzing/swarm/src/threads.h:57
    #6 0x7f219a1cc6da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
    #7 0x7f219aeb288e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x12188e)

0x6030000018a4 is located 0 bytes to the right of 4-byte region [0x6030000018a0,0x6030000018a4)
allocated by thread T3 here:
    #0 0x7f219b2617a0 in posix_memalign (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xdf7a0)
    #1 0x55dfab7eb1d3 in xmalloc /home/ubuntu/swarm3_fuzzing/swarm/src/util.cc:95
    #2 0x55dfab7eb1d3 in check_heavy_thread /home/ubuntu/swarm3_fuzzing/swarm/src/algod1.cc:390

Thread T3 created by T0 here:
    #0 0x7f219b1b9d2f in __interceptor_pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x37d2f)
    #1 0x55dfab7edd44 in __base_ctor  /home/ubuntu/swarm3_fuzzing/swarm/src/threads.h:89

SUMMARY: AddressSanitizer: heap-buffer-overflow (/usr/lib/x86_64-linux-gnu/libasan.so.4+0x79779)
Shadow bytes around the buggy address:
  0x0c067fff82c0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff82d0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff82e0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff82f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8300: fa fa fd fd fd fd fa fa fd fd fd fd fa fa fd fd
=>0x0c067fff8310: fd fd fa fa[04]fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8320: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8330: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8340: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8350: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff8360: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==27114==ABORTING
torognes commented 5 years ago

I see similar problems when I run it with the same input in valgrind.

frederic-mahe commented 5 years ago

I confirm:

valgrind ./swarm -f -o /dev/null -l /dev/null <(printf ">s1_10\nAA\n>s2_1\nCC\n")
==31554== Memcheck, a memory error detector
==31554== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==31554== Using Valgrind-3.12.0.SVN and LibVEX; rerun with -h for copyright info
==31554== Command: ./swarm -f -o /dev/null /dev/fd/63 -l /dev/null
==31554== 
==31554== Thread 2:
==31554== Invalid write of size 2
==31554==    at 0x4C3269B: memmove (vg_replace_strmem.c:1252)
==31554==    by 0x117A3D: generate_variant_sequence (variants.cc:79)
==31554==    by 0x117A3D: check_heavy_var (algod1.cc:367)
==31554==    by 0x117A3D: check_heavy_thread(long) (algod1.cc:402)
==31554==    by 0x117BCB: ThreadRunner::worker(void*) (threads.h:57)
==31554==    by 0x4E3F493: start_thread (pthread_create.c:333)
==31554==    by 0x59DAACE: clone (clone.S:97)
==31554==  Address 0x5e37334 is 0 bytes after a block of size 4 alloc'd
==31554==    at 0x4C2DF16: memalign (vg_replace_malloc.c:857)
==31554==    by 0x4C2E021: posix_memalign (vg_replace_malloc.c:1020)
==31554==    by 0x1173F0: xmalloc (util.cc:95)
==31554==    by 0x1173F0: check_heavy_thread(long) (algod1.cc:390)
==31554==    by 0x117BCB: ThreadRunner::worker(void*) (threads.h:57)
==31554==    by 0x4E3F493: start_thread (pthread_create.c:333)
==31554==    by 0x59DAACE: clone (clone.S:97)
==31554== 
==31554== Invalid write of size 8
==31554==    at 0x117A66: nt_set (variants.cc:33)
==31554==    by 0x117A66: generate_variant_sequence (variants.cc:80)
==31554==    by 0x117A66: check_heavy_var (algod1.cc:367)
==31554==    by 0x117A66: check_heavy_thread(long) (algod1.cc:402)
==31554==    by 0x117BCB: ThreadRunner::worker(void*) (threads.h:57)
==31554==    by 0x4E3F493: start_thread (pthread_create.c:333)
==31554==    by 0x59DAACE: clone (clone.S:97)
==31554==  Address 0x5e37330 is 0 bytes inside a block of size 4 alloc'd
==31554==    at 0x4C2DF16: memalign (vg_replace_malloc.c:857)
==31554==    by 0x4C2E021: posix_memalign (vg_replace_malloc.c:1020)
==31554==    by 0x1173F0: xmalloc (util.cc:95)
==31554==    by 0x1173F0: check_heavy_thread(long) (algod1.cc:390)
==31554==    by 0x117BCB: ThreadRunner::worker(void*) (threads.h:57)
==31554==    by 0x4E3F493: start_thread (pthread_create.c:333)
==31554==    by 0x59DAACE: clone (clone.S:97)
==31554== 
==31554== 
==31554== HEAP SUMMARY:
==31554==     in use at exit: 552 bytes in 1 blocks
==31554==   total heap usage: 40 allocs, 39 frees, 5,925,248 bytes allocated
==31554== 
==31554== LEAK SUMMARY:
==31554==    definitely lost: 0 bytes in 0 blocks
==31554==    indirectly lost: 0 bytes in 0 blocks
==31554==      possibly lost: 0 bytes in 0 blocks
==31554==    still reachable: 552 bytes in 1 blocks
==31554==         suppressed: 0 bytes in 0 blocks
==31554== Rerun with --leak-check=full to see details of leaked memory
==31554== 
==31554== For counts of detected and suppressed errors, rerun with: -v
==31554== ERROR SUMMARY: 10 errors from 2 contexts (suppressed: 0 from 0)
frederic-mahe commented 5 years ago

Also, note the difference when using the --log option or not (no --fastidious option):

valgrind ./swarm -o /dev/null <(printf ">s1_10\nAA\n>s2_1\nCC\n") -l /dev/null 
==2084== HEAP SUMMARY:
==2084==     in use at exit: 552 bytes in 1 blocks
==2084==   total heap usage: 28 allocs, 27 frees, 5,399,504 bytes allocated

valgrind ./swarm -o /dev/null <(printf ">s1_10\nAA\n>s2_1\nCC\n")
==2094== HEAP SUMMARY:
==2094==     in use at exit: 0 bytes in 0 blocks
==2094==   total heap usage: 26 allocs, 26 frees, 5,394,856 bytes allocated
==2094== 
==2094== All heap blocks were freed -- no leaks are possible

There seems to be a small memory leak when using the --log option (1 allocation is not freed).

torognes commented 5 years ago

The initial problem was caused by an incorrect memory allocation for the variant sequences. For short sequences, when no sequence was longer than 6 bp, the amount allocated was too little. For longer sequences the amount was actually closer to 4x too much. Fixed now in commit 996e052d7f9bc963cd442161f29bd50ad35b2327.

frederic-mahe commented 5 years ago

I've run address sanitizer tests and fuzzing on swarm 2 with the fastidious option without any problem, so I assume that issue concerned only swarm 3.

Is it possible to get rid of the unfreed-allocation reported by valgrind when using --log?

torognes commented 5 years ago

Yes, the problem only applies to Swarm 3 with the fastidious option. The memory allocation for some temporary variant sequences was not adjusted after changing to 2-bit representation. The memory needs were actually reduced to one fourth, but they needed padding to 8 bytes. The overall total memory needs are not changed.

Regarding the lost memory with log file: The error is located and will be fixed in both branches in a moment. I just forgot to close the log file at the end of the program. Applies to all versions.

torognes commented 5 years ago

Log file problem fixed in commit 6edcf8eff09da51388e8c4c633da82621eeb5554.

Thanks for spotting these bugs!