torognes / swarm

A robust and fast clustering method for amplicon-based studies
GNU Affero General Public License v3.0
123 stars 23 forks source link

Benign buffer overflow when parsing fasta sequences #120

Closed frederic-mahe closed 5 years ago

frederic-mahe commented 5 years ago

when compiling swarm with gcc's address sanitizer and testing the binary with afl-fuzz, a potential buffer overflow is detected in db.cc the code parsing and storing fasta input.

Makefile modifications:

COMMON=-fsanitize=address -ggdb
CXX=afl-g++

Compilation command:

AFL_USE_ASAN=1 make

afl-fuzz search:

afl-fuzz -m none -i test_files/ -o results/ -M fuzzer01 swarm -o /dev/null @@
afl-fuzz -m none -i test_files/ -o results/ -S fuzzer02 swarm -o /dev/null @@
afl-fuzz -m none -i test_files/ -o results/ -S fuzzer03 swarm -o /dev/null @@
afl-fuzz -m none -i test_files/ -o results/ -S fuzzer04 swarm -o /dev/null @@
...

where ./test_files/ contains a tiny.fas file:

>s1
ACCT
>s2
AGGT

Output example:

==21100==ERROR: AddressSanitizer: global-buffer-overflow on address 0x55a1178075dd at pc 0x55a1175b10ae bp 0x7ffea498b9e0 sp 0x7ffea498b9d0
READ of size 1 at 0x55a1178075dd thread T0
    #0 0x55a1175b10ad in db_read(char const*) /home/ubuntu/src/swarm/src/db.cc:273
    #1 0x55a1175a6728 in main /home/ubuntu/src/swarm/src/swarm.cc:657
    #2 0x7fddad3d8b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)
    #3 0x55a1175a6f99 in _start (/usr/local/bin/swarm+0xcf99)

0x55a1178075dd is located 3 bytes to the left of global variable 'map_nt' defined in 'db.cc:32:6' (0x55a1178075e0) of size 256
0x55a1178075dd is located 29 bytes to the right of global variable 'map_hex' defined in 'db.cc:52:6' (0x55a1178074c0) of size 256

On line 273 of db_read.c a character is read from the input file and there is a check to see if it is a legal character by looking it up in a table of all 256 characters. The problem is that the character (char) is treated as a small signed integer (int) instead of an unsigned integer. If the value is 253 (outside the usual range of printable ascii characters) it will be treated as -3 instead and it will access a byte outside the table.

Solved in https://github.com/torognes/swarm/commit/585a775b6d671bd9200256833f0809147bd454e0

Address sanitizer does not detect other issues with swarm's latest version (when using default parameters, or when activating the d > 1 code).