Fix OOB write in text parsing

thestr4ng3r commented 3 years ago

This happens because load_flush_line() will write a zero-byte just after the line. Most of the time, this just overwrites the \n but at the very end of the file, it writes one byte after the allocated buffer. With mmap this usually stays undetected because the whole page is allocated, but with mmap disabled and asan it shows even in the current tests. There would be different ways to fix this:

Always pass a 1-byte bigger buffer to sdb_text_load_buf(): If done directly like this, a user of the function would have to be aware of that and it goes against everything we are used to when we have an additional size argument. Alternatively the string may always be forced to be zero-terminated and the size removed, but that would imply a lot of additional checks and issues, or a strlen over the possibly huge buffer so this isn't really a nice solution either
Make load_flush_line() not write this zero-byte there: In this case, it would have to do a strdup in every single line to feed the value into sdb_set(), which would be a huge additional overhead.
Before parsing the last line, strdup the line with size+1 and let load_flush_line() work on that copy. This is what I did here. It's a bit ugly but the solution with the least problems and it is covered well by tests.

[florian@florian-desktop sdb]$ build/test/unit/test_sdb
test_sdb_namespace OK
test_sdb_foreach_delete OK
test_sdb_list_delete OK
test_sdb_delete_none OK
test_sdb_delete_alot OK
test_sdb_milset OK
test_sdb_milset_random OK
test_sdb_list_big OK
test_sdb_foreach_filter OK
test_sdb_copy OK
test_sdb_text_save_simple OK
test_sdb_text_save_simple_unsorted OK
test_sdb_text_load_simple OK
test_sdb_text_save OK
test_sdb_text_load OK
test_sdb_text_load_bad_nl OK
test_sdb_text_load_broken OK
=================================================================
==54444==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60e0000001b9 at pc 0x56104b5fb3bf bp 0x7ffd98007da0 sp 0x7ffd98007d90
WRITE of size 1 at 0x60e0000001b9 thread T0
    #0 0x56104b5fb3be in load_flush_line ../src/text.c:229
    #1 0x56104b5fc721 in sdb_text_load_buf ../src/text.c:364
    #2 0x56104b5fca38 in sdb_text_load ../src/text.c:394
    #3 0x56104b5e63f2 in test_sdb_text_load_file ../test/unit/test_sdb.c:471
    #4 0x56104b5e6b8e in all_tests ../test/unit/test_sdb.c:502
    #5 0x56104b5e6be3 in main ../test/unit/test_sdb.c:507
    #6 0x7fbb5b5d6151 in __libc_start_main (/usr/lib/libc.so.6+0x28151)
    #7 0x56104b5e147d in _start (/home/florian/dev/sdb/build/test/unit/test_sdb+0x647d)

0x60e0000001b9 is located 0 bytes to the right of 153-byte region [0x60e000000120,0x60e0000001b9)
allocated by thread T0 here:
    #0 0x7fbb5b829639 in __interceptor_calloc /build/gcc/src/gcc/libsanitizer/asan/asan_malloc_linux.cpp:154
    #1 0x56104b5fc941 in sdb_text_load ../src/text.c:385
    #2 0x56104b5e63f2 in test_sdb_text_load_file ../test/unit/test_sdb.c:471
    #3 0x56104b5e6b8e in all_tests ../test/unit/test_sdb.c:502
    #4 0x56104b5e6be3 in main ../test/unit/test_sdb.c:507
    #5 0x7fbb5b5d6151 in __libc_start_main (/usr/lib/libc.so.6+0x28151)

SUMMARY: AddressSanitizer: heap-buffer-overflow ../src/text.c:229 in load_flush_line
Shadow bytes around the buggy address:
  0x0c1c7fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c1c7fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c1c7fff8000: fa fa fa fa fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c1c7fff8010: fd fd fd fd fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c1c7fff8020: fa fa fa fa 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c1c7fff8030: 00 00 00 00 00 00 00[01]fa fa fa fa fa fa fa fa
  0x0c1c7fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1c7fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1c7fff8060: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1c7fff8070: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c1c7fff8080: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
  Shadow gap:              cc
==54444==ABORTING

trufae commented 3 years ago

What about not processing the last line if it doesnt contain a newline? imho all lines in a text file should end with a newline, we can consider this case invalid, and avoid the malloc+1 && memcpy

thestr4ng3r commented 3 years ago

What about not processing the last line if it doesnt contain a newline? imho all lines in a text file should end with a newline, we can consider this case invalid, and avoid the malloc+1 && memcpy

I think not having a newline after the last line is a valid case, many text editors won't put it unless explicitly configured. I also wouldn't want to make the format unnecessarily strict just because our parser code can't handle it.

thestr4ng3r commented 3 years ago

ping @trufae

thestr4ng3r commented 3 years ago

ping @trufae

trufae commented 3 years ago

Don't stress @thestr4ng3r . only 3 days have passed since you sent the PR, i have wait months for a single review and I have been (and I'm still) very busy at work and at home. Now I have some time to check this.

I was needing this time basically to try your code and understand the ctx.path loop because it seems to me that just registering a line to be processed takes so many lines. So here are some comments:

load_flush_line seems like a bad name to me, because it's not actually flushing anything
i didnt knew how the ctx.path was handled
This code only makes sense when building sdb with USE_MMAN, right?
if building with mmap support, and mmap fails, there's no fallback
ctx.buf contains a freed pointer at the end of the function, which may be good to nullify imho

The code works, but looks a bit overengineered because instead of processing line by line it's processing char by char, so at the end .. if it finds out the buffer contains no final newline byte it needs to reshift all the line context. But i can't think of a better way, unless the ctx contains the actual shifting information so it wont require to touch all the pointers when this happens. but as long as this is an exception it is better to stick to this solution i think.

So sumarizing:

nullify ctx.buf after freeing the pointer
wrap this code inside USE_MMAN
move this code into a separate function

Then looks good to merge for me

trufae commented 3 years ago

Oh! Forgot to add this to the summary:

rename the flush function with something like “process_line”

thestr4ng3r commented 3 years ago

This code only makes sense when building sdb with USE_MMAN, right?

No. I tried to explain this in the pr description but the short version is this:

Without mmap, it always oobs.
With mmap, it also strictly oobs but usually stays undetected because of the padding of the page size.
No matter whether you have mmap or not, it always oobs if you use sdb_text_load_buf(Sdb *s, char *buf, size_t sz) when the buf is of size exactly sz.

Hence, the following change would be wrong:

wrap this code inside USE_MMAN

For the other three points, I pushed commits.

radareorg / sdb

Fix OOB write in text parsing #226