stcorp / coda

The Common Data Access toolset
http://stcorp.github.io/coda/doc/html/index.html
BSD 3-Clause "New" or "Revised" License
39 stars 17 forks source link

coda_expression_fuzzer: Stack-overflow in print_expression #66

Closed schwehr closed 4 years ago

schwehr commented 4 years ago
(gdb) bt
#0  print_expression (expr=<optimized out>, print=<optimized out>, xml=<optimized out>, html=<optimized out>, precedence=<optimized out>)
    at third_party/stcorp_coda/libcoda/coda-expr.c:4196
#1  0x00005555560b9a3b in coda_expression_print (expr=0x0, print=0xffffe9a4b80) at third_party/stcorp_coda/libcoda/coda-expr.c:4876
#2  0x0000555556053145 in LLVMFuzzerTestOneInput (
    data=0x7ffff49dc800 "1--1--1--1--1--1--1--1--1--1--1--1--3--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--0--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1--1-1--1--1--1--1--1--1--1--1--"..., size=142921) at third_party/stcorp_coda/fuzz/coda_expression_fuzzer.cc:27
AddressSanitizer:DEADLYSIGNAL
=================================================================
==60967==ERROR: AddressSanitizer: stack-overflow on address 0x7ffd986efff8 (pc 0x7faf5348d055 bp 0x7ffd986f0050 sp 0x7ffd986f0000 T0)
    #3 0x5556b767602b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4198:5
    #4 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #5 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #6 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #7 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #8 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #9 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #10 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #11 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #12 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #13 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #14 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
[SNIP]
    #250 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #251 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #252 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #253 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #254 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13
    #255 0x5556b767911b in print_expression third_party/stcorp_coda/libcoda/coda-expr.c:4771:13

SUMMARY: AddressSanitizer: stack-overflow 
==60967==ABORTING

One possible solution is to modify print_expression to track depth

static int print_expression(const coda_expression *expr, int (*print) (const char *, ...), int xml, int html,
                            int precedence)
{

Would become

static int print_expression(const coda_expression *expr, int (*print) (const char *, ...), int xml, int html,
                            int precedence, int depth)
{
    if (depth > 32) {
        coda_set_error(CODA_ERROR_INVALID_FORMAT, "Too many sub expressions");
        return -1;
    }
    depth++;

And it would start with things like this:

int coda_expression_print_html(const coda_expression *expr, int (*print) (const char *, ...))
{
    const int depth = 0;
    return print_expression(expr, print, 1, 1, 15, depth);
}

The fuzzer:

int printf_black_hole(const char* fmt, ...) {
  va_list args;
  va_end(args);
  return 0;
}

extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  coda_init();
  auto done = absl::MakeCleanup([] { coda_done(); });

  const std::string exprstring(reinterpret_cast<const char *>(data), size);

  coda_expression *expr = nullptr;
  coda_expression_from_string(exprstring.c_str(), &expr);

  if (!expr) return 0;

  coda_expression_print(expr, printf_black_hole);
  coda_expression_delete(expr);

  return 0;
}

testcase-4902997930278912.zip

svniemeijer commented 4 years ago

I am not sure that this is something we can 'resolve' easily in the code. There are no guarantees about the size of the stack. This is purely compiler and os dependent. Even some arbitrary depth limit of 32 is not guaranteed to solve the problem (with the added downside that it will result in valid expressions to become unprintable, which is definitely a bug).

As far as I know, stack overflow errors are just a thing that can happen when using recursive functions. The only real solution would be to not use recursive functions and rewrite these functions using loops that use the heap. But is this really worth the effort? How problematic is the raising of a stack overflow error in case of malconstructed input?

schwehr commented 4 years ago

In my situation, a stack overflow is something I can't allow processes to do. So I have to put a fix in and I'd rather it match what is in the upstream repository. A non-recursive solution isn't critical for the short term. Why not make a solution that will almost always allow the default build to print a bogus expression, but that I can easily override in my build? In my compiler settings, I can just add "-DCODA_MAX_RECURSION=64" (or whatever number works for my memory configuration) e.g.

#ifndef CODA_MAX_RECURSION
// May OOM or stack-overflow with this many levels of recursion
#define CODA_MAX_RECURSION 100000
#endif

and

static int print_expression(const coda_expression *expr, int (*print) (const char *, ...), int xml, int html,
                            int precedence, int depth)
{
    if (depth > CODA_MAX_RECURSION) {
        coda_set_error(CODA_ERROR_INVALID_FORMAT, "too many levels of recursion in expression");
        return -1;
    }
    depth++;

Some of the reasons behind limiting the recursion:

  1. Crashing is worse than returning a cannot finish printing error for most users in my experience
  2. A stack overflow is a security risk
  3. Deep recursion can be used as a denial-of-service (DOS) for servers using coda
  4. Chasing down a bug in production can be really difficult when there are thousands of workers traversing millions of files. Being able to log an error is easier to track than dealing with a stack trace from a crash
  5. In a highly threaded environment, allowing a larger stack size is expensive (e.g. most of my processes have > 50 threads)
  6. Unlimited recursion slows down fuzzer progress for finding other bugs

See also this discussion which is similar:

https://github.com/nlohmann/json/issues/832#issuecomment-363192371

svniemeijer commented 4 years ago

Ok. I understand the rationale.

The problem is that this is not just about the print_expression() function. It also applies to the evaluation and delete functions for expressions. So, if we want to limit the depth we should do this in the construction of the expression tree (i.e. somewhere within coda_expression_from_string). I will have a go at this.

svniemeijer commented 4 years ago

Implemented in 2cde1d8c64bf191627505a417a0922d913e3a58e

schwehr commented 4 years ago

Verified