umnsec / mlta

TypeDive: Multi-Layer Type Analysis (MLTA) for Refining Indirect-Call Targets
MIT License
76 stars 19 forks source link

Potential false negative problem in type confinement #10

Open for-just-we opened 5 months ago

for-just-we commented 5 months ago

First, I would like to express my respect for this work. By reviewing the source code of MLTA and some test cases I found a potential case that may cause MLTA to produce additional false negatives than FLTA.

Below is an example from dovecot project. And iostream_pump_flush is an address-taken function used as an argument of call expression o_stream_set_flush_callback(pump->output, iostream_pump_flush, pump);.

I noticed MLTA does process the case that addr-taken function is used as a call argument for type confining. However, in this case, there exists an indirect call in the call in the call-chain, _stream->set_flush_callback(_stream, callback, context); call o_stream_default_set_flush_callback. Where type confinement happens in o_stream_default_set_flush_callback. Will this lead to insufficient confinement of iostream_pump_flush to field ostream::callbacks ?

void iostream_pump_start(struct iostream_pump *pump)
{
    i_assert(pump != NULL);
    i_assert(pump->callback != NULL);

    /* add flush handler */
    if (!pump->output->blocking) {
        o_stream_set_flush_callback(pump->output,
                        iostream_pump_flush, pump);
    }

    /* make IO objects */
    if (pump->input->blocking) {
        i_assert(!pump->output->blocking);
        o_stream_set_flush_pending(pump->output, TRUE);
    } else {
        pump->io = io_add_istream(pump->input,
                      iostream_pump_copy, pump);
        io_set_pending(pump->io);
    }
}

void o_stream_set_flush_callback(struct ostream *stream,
                 stream_flush_callback_t *callback,
                 void *context)
{
    struct ostream_private *_stream = stream->real_stream;

    _stream->set_flush_callback(_stream, callback, context);
}

// indirect invoked by _stream->set_flush_callback
static void
o_stream_default_set_flush_callback(struct ostream_private *_stream,
                    stream_flush_callback_t *callback,
                    void *context)
{
    if (_stream->parent != NULL)
        o_stream_set_flush_callback(_stream->parent, callback, context);

    _stream->callback = callback;
    _stream->context = context;
}
YantingChi commented 5 months ago

Hi, Thanks for your interest in MLTA. I looked into the dovecot/core, and your case mentioned sounds like a recursive function call.

From my understanding, your question is about whether indirect assignment between function and struct fields will be missed or not. More specifically, in your case the type-confinement happens in _stream->callback = callback; Where the callback function (namely iostream_pump_flush) is assigned to a field of _stream struct. However, when MLTA tries to solve the type hierarchy of the address-taken function, it might miss the assignment of the function iostream_pump_flush to a _stream struct. In that case, when we want to query the potential callee of a caller with a layer hierarchy _stream. callback, we might miss the iostream_pump_flush function.

This is an interesting limitation of MLTA, but it will not introduce FN but introduce some FP instead. In your case, MLTA will treat all the functions that are with type stream_flush_callback_t (or types that can be cast to this type) as potential right values in _stream->callback= callback , because we cannot solve the what function might be passed to o_stream_default_set_flush_callback. It will introduce a lot of FPs, but theoretically, MLTA will not miss the target.

I cannot build the core project (the install. md of the project seems not updated). Have you checked if MLTA will actually fail to capture the layered information for _stream->callback = callback; and thus FNs exist? It would be interesting to see if MLTA misses that target, and that might be a bug in the code implementation.

Hope my answer helps!

for-just-we commented 5 months ago

introduce a lot of FPs, but theoretically, MLTA will not miss the target.

I cannot build the core project (the install. md of the project seems not updated). Have you checked if MLTA will actually fail to capture the layered information for _stream->callback = callback; and thus FNs exist? It would be interesting to see if MLTA misses that target, and that might be a bug in the code implementation.

I build dovecot following oss-fuzz.

I actually don't run MLTA on the bitcode level. I tried to implement a source-code level MLTA (which of course cannot be as accurate as at the IR level) and found the issue when running it.

I understand that when MLTA identifies an escape type, it may downgrade to solve i-call by FLTA. And your answer seems to imply that in this case, MLTA will identify the type of _stream->callback that will escape. However, I don't understand how MLTA identifies it.

By reviewing the source code of MLTA, I speculate that if o_stream_default_set_flush_callback is called directly, MLTA will solve the type confine of _stream->callback = iostream_pump_flush;. (By replacing parameter callback with iostream_pump_flush).

However, when _stream->callback = iostream_pump_flush; is missing. And in other place a i-call fstream->ostream.callback(fstream->ostream.context); references the same field callback of struct ostream_private. How will MLTA know that there might be an escape-type.

YantingChi commented 5 months ago

Thanks for the feedback, as oss-fuzz typically will generate many exe files, could you tell me which specific executable in oss-fuzz/dovecot are you targeting? In that case, I can check if there are real false negatives here.

for-just-we commented 5 months ago

Thanks for the feedback, as oss-fuzz typically will generate many exe files, could you tell me which specific executable in oss-fuzz/dovecot are you targeting? In that case, I can check if there are real false negatives here.

I actually forget that, but I implement 3 simple testcases with the code in dovecot. Although they cannot run, but can generate bitcode for MLTA to analyze.

testcase1

#define TRUE 1
#define bool _Bool
#define NULL 0

typedef void io_callback_t(void *context);
typedef void iostream_pump_callback_t(enum iostream_pump_status status,
                      void *context);
typedef int stream_flush_callback_t(void *context);

enum iostream_pump_status {
    /* pump succeeded - EOF received from istream and all output was
       written successfully to ostream. */
    IOSTREAM_PUMP_STATUS_INPUT_EOF,
    /* pump failed - istream returned an error */
    IOSTREAM_PUMP_STATUS_INPUT_ERROR,
    /* pump failed - ostream returned an error */
    IOSTREAM_PUMP_STATUS_OUTPUT_ERROR,
};

struct ostream {
    /* Number of bytes sent via o_stream_send*() and similar functions.
       This is counting the input data. For example with a compressed
       ostream this is counting the uncompressed bytes. The compressed
       bytes could be counted from the parent ostream's offset.

       Seeking to a specified offset only makes sense if there is no
       difference between input and output data sizes (e.g. there are no
       wrapper ostreams changing the data). */
    unsigned int offset;

    /* errno for the last operation send/seek operation. cleared before
       each call. */
    int stream_errno;

    /* overflow is set when some of the data given to send()
       functions was neither sent nor buffered. It's never unset inside
       ostream code. */
    bool overflow:1;
    /* o_stream_send() writes all the data or returns failure */
    bool blocking:1;
    bool closed:1;

    struct ostream_private *real_stream;
};

struct ostream_private {
    void (*set_flush_callback)(struct ostream_private *stream,
                   stream_flush_callback_t *callback,
                   void *context);

    struct ostream *parent;

    stream_flush_callback_t *callback;

    void *context;
};

struct iostream_pump {
    int refcount;
    struct io* io;
    struct ostream *output;
    iostream_pump_callback_t *callback;
    void *context;

    bool waiting_output;
    bool completed;
};

struct io {
    const char *source_filename;
    unsigned int source_linenum;
    /* trigger I/O callback even if OS doesn't think there is input
       pending */
    bool pending;
    /* This IO event shouldn't be the only thing being waited on, because
       it would just result in infinite wait. */
    bool never_wait_alone;

    io_callback_t *callback;
        void *context;

    struct ioloop *ioloop;
    struct ioloop_context *ctx;
};

void iostream_pump_start(struct iostream_pump *pump)
{
    o_stream_set_flush_callback(pump->output,
                        iostream_pump_flush, pump);
}

void o_stream_set_flush_callback(struct ostream *stream,
                 stream_flush_callback_t *callback,
                 void *context)
{
    struct ostream_private *_stream = stream->real_stream;
    _stream->set_flush_callback(_stream, callback, context);
}

// indirect invoked by _stream->set_flush_callback
static void
o_stream_default_set_flush_callback(struct ostream_private *_stream,
                    stream_flush_callback_t *callback,
                    void *context)
{
    if (_stream->parent != NULL)
        o_stream_set_flush_callback(_stream->parent, callback, context);

    _stream->callback = callback;
    _stream->context = context;
}

static int iostream_pump_flush(struct iostream_pump *pump)
{
    int ret = 0;
    return ret;
}

static void stream_send_io(struct ostream_private *stream) {
    // indirect-call
    stream->callback(stream->context);
}

In this case, the assignment of iostream_pump_flush involves two callsites including one indirect-callsite.

testcase2


#define TRUE 1
#define bool _Bool
#define NULL 0

typedef void io_callback_t(void *context);
typedef void iostream_pump_callback_t(enum iostream_pump_status status,
                      void *context);
typedef int stream_flush_callback_t(void *context);

enum iostream_pump_status {
    /* pump succeeded - EOF received from istream and all output was
       written successfully to ostream. */
    IOSTREAM_PUMP_STATUS_INPUT_EOF,
    /* pump failed - istream returned an error */
    IOSTREAM_PUMP_STATUS_INPUT_ERROR,
    /* pump failed - ostream returned an error */
    IOSTREAM_PUMP_STATUS_OUTPUT_ERROR,
};

struct ostream {
    /* Number of bytes sent via o_stream_send*() and similar functions.
       This is counting the input data. For example with a compressed
       ostream this is counting the uncompressed bytes. The compressed
       bytes could be counted from the parent ostream's offset.

       Seeking to a specified offset only makes sense if there is no
       difference between input and output data sizes (e.g. there are no
       wrapper ostreams changing the data). */
    unsigned int offset;

    /* errno for the last operation send/seek operation. cleared before
       each call. */
    int stream_errno;

    /* overflow is set when some of the data given to send()
       functions was neither sent nor buffered. It's never unset inside
       ostream code. */
    bool overflow:1;
    /* o_stream_send() writes all the data or returns failure */
    bool blocking:1;
    bool closed:1;

    struct ostream_private *real_stream;
};

struct ostream_private {
    void (*set_flush_callback)(struct ostream_private *stream,
                   stream_flush_callback_t *callback,
                   void *context);

    struct ostream *parent;

    stream_flush_callback_t *callback;

    void *context;
};

struct iostream_pump {
    int refcount;
    struct io* io;
    struct ostream *output;
    iostream_pump_callback_t *callback;
    void *context;

    bool waiting_output;
    bool completed;
};

struct io {
    const char *source_filename;
    unsigned int source_linenum;
    /* trigger I/O callback even if OS doesn't think there is input
       pending */
    bool pending;
    /* This IO event shouldn't be the only thing being waited on, because
       it would just result in infinite wait. */
    bool never_wait_alone;

    io_callback_t *callback;
        void *context;

    struct ioloop *ioloop;
    struct ioloop_context *ctx;
};

static int iostream_pump_flush(void *pump)
{
    int ret = 0;
    return ret;
}

void o_stream_set_flush_callback(struct ostream *stream,
                 stream_flush_callback_t *callback,
                 void *context);

// indirect invoked by _stream->set_flush_callback
static void
o_stream_default_set_flush_callback(struct ostream_private *_stream,
                    stream_flush_callback_t *callback,
                    void *context)
{
    _stream->callback = callback;
    _stream->context = context;
}

void o_stream_set_flush_callback(struct ostream *stream,
                 stream_flush_callback_t *callback,
                 void *context)
{
    struct ostream_private *_stream = stream->real_stream;
    o_stream_default_set_flush_callback(_stream, callback, context);
}

void iostream_pump_start(struct iostream_pump *pump)
{
    o_stream_set_flush_callback(pump->output,
                        iostream_pump_flush, pump);
}

static void stream_send_io(struct ostream_private *stream) {
    // indirect-call
    stream->callback(stream->context);
}

int main() {
    struct iostream_pump *pump;
    iostream_pump_start(pump);
    stream_send_io(pump->output);
    return 0;
}

In this case, the assignment of addr-taken function involves two direct callsite.

testcase3

#define TRUE 1
#define bool _Bool
#define NULL 0

typedef void io_callback_t(void *context);
typedef void iostream_pump_callback_t(enum iostream_pump_status status,
                      void *context);
typedef int stream_flush_callback_t(void *context);

enum iostream_pump_status {
    /* pump succeeded - EOF received from istream and all output was
       written successfully to ostream. */
    IOSTREAM_PUMP_STATUS_INPUT_EOF,
    /* pump failed - istream returned an error */
    IOSTREAM_PUMP_STATUS_INPUT_ERROR,
    /* pump failed - ostream returned an error */
    IOSTREAM_PUMP_STATUS_OUTPUT_ERROR,
};

struct ostream {
    /* Number of bytes sent via o_stream_send*() and similar functions.
       This is counting the input data. For example with a compressed
       ostream this is counting the uncompressed bytes. The compressed
       bytes could be counted from the parent ostream's offset.

       Seeking to a specified offset only makes sense if there is no
       difference between input and output data sizes (e.g. there are no
       wrapper ostreams changing the data). */
    unsigned int offset;

    /* errno for the last operation send/seek operation. cleared before
       each call. */
    int stream_errno;

    /* overflow is set when some of the data given to send()
       functions was neither sent nor buffered. It's never unset inside
       ostream code. */
    bool overflow:1;
    /* o_stream_send() writes all the data or returns failure */
    bool blocking:1;
    bool closed:1;

    struct ostream_private *real_stream;
};

struct ostream_private {
    void (*set_flush_callback)(struct ostream_private *stream,
                   stream_flush_callback_t *callback,
                   void *context);

    struct ostream *parent;

    stream_flush_callback_t *callback;

    void *context;
};

struct iostream_pump {
    int refcount;
    struct io* io;
    struct ostream *output;
    iostream_pump_callback_t *callback;
    void *context;

    bool waiting_output;
    bool completed;
};

struct io {
    const char *source_filename;
    unsigned int source_linenum;
    /* trigger I/O callback even if OS doesn't think there is input
       pending */
    bool pending;
    /* This IO event shouldn't be the only thing being waited on, because
       it would just result in infinite wait. */
    bool never_wait_alone;

    io_callback_t *callback;
        void *context;

    struct ioloop *ioloop;
    struct ioloop_context *ctx;
};

static int iostream_pump_flush(void *pump)
{
    int ret = 0;
    return ret;
}

// indirect invoked by _stream->set_flush_callback
static void
o_stream_default_set_flush_callback(struct ostream_private *_stream,
                    stream_flush_callback_t *callback,
                    void *context)
{
    _stream->callback = callback;
    _stream->context = context;
}

static void stream_send_io(struct ostream_private *stream) {
    // indirect-call
    stream->callback(stream->context);
}

int main() {
    struct ostream *stream;
    void* context;
    o_stream_default_set_flush_callback(stream, iostream_pump_flush, context);
    stream_send_io(stream);
    return 0;
}

In this case, the assignment of addr-taken function involves only one callsite.

result

However, I notice in all 3 cases MLTA fail to report iostream_pump_flush.

for-just-we commented 5 months ago

This also reminds me of an interesting extended question, which is "Can MLTA induce false negatives when it comes to complex data flows?" compared with FLTA.

In your paper, you seem to hypothesize that all address-taken functions referred to by structs fields are assigned straightforwardly. However, in the following cases. address-taken function addr_taken_func is assigned to Handler::func through a complex data-flow that involves an array funcs.

In my understanding, MLTA seems not to conduct such complex data-flow (which may also include inter-procedural complex data-flow). In this case, MLTA should regard Handler::func as an escape type. But I don't figure out how will MLTA identify the escape type. It seems that the paper doesn't discuss the details of the escape type.


void addr_taken_func();

...
func_ptr_t funcs[10];

funcs[5] = addr_taken_func;
...

struct Handler handler;
handler.func = funcs[5];
for-just-we commented 5 months ago

Hi, Thanks for your interest in MLTA. I looked into the dovecot/core, and your case mentioned sounds like a recursive function call.

From my understanding, your question is about whether indirect assignment between function and struct fields will be missed or not. More specifically, in your case the type-confinement happens in _stream->callback = callback; Where the callback function (namely iostream_pump_flush) is assigned to a field of _stream struct. However, when MLTA tries to solve the type hierarchy of the address-taken function, it might miss the assignment of the function iostream_pump_flush to a _stream struct. In that case, when we want to query the potential callee of a caller with a layer hierarchy _stream. callback, we might miss the iostream_pump_flush function.

This is an interesting limitation of MLTA, but it will not introduce FN but introduce some FP instead. In your case, MLTA will treat all the functions that are with type stream_flush_callback_t (or types that can be cast to this type) as potential right values in _stream->callback= callback , because we cannot solve the what function might be passed to o_stream_default_set_flush_callback. It will introduce a lot of FPs, but theoretically, MLTA will not miss the target.

I cannot build the core project (the install. md of the project seems not updated). Have you checked if MLTA will actually fail to capture the layered information for _stream->callback = callback; and thus FNs exist? It would be interesting to see if MLTA misses that target, and that might be a bug in the code implementation.

Hope my answer helps!

In your crix project. You seem to comment that such a case as: pool->free = free_fn; // free_fn is a function pointer from a function argument should be deemed as escaped. Does this mean in the case we discuss, when a func pointer field of struct load value from a function pointer variable (could be a function parameter or a simple func pointer global variable), we should simply regard it as escaped instead of doing inter-procedural analysis? Although it might induce more false positives. In this case, it seems that no false positive will be induced. But will lose precision on some simple stupid data-flow cases.

YantingChi commented 5 months ago

Thanks for the description and comments

In your crix project. You seem to comment that such a case as pool->free = free_fn; // free_fn is a function pointer from a function argument should be deemed as escaped. Does this mean in the case we discuss, when a func pointer field of struct load value from a function pointer variable (could be a function parameter or a simple func pointer global variable), we should simply regard it as escaped instead of doing inter-procedural analysis? Although it might induce more false positives. In this case, it seems that no false positive will be induced. But it will lose precision on some simple stupid data-flow cases.

I think you are mentioning there are no false negatives but more false positives here. I believe MLTA does not involve expensive inter-procedural analysis as it can quickly scanned through the Linux kernel. Involving inter-procedural analysis might increase the overhead.

I can double-check the code to make sure whether there are inter-procedural analyses in MLTA or not.

However, I notice in all 3 cases MLTA fails to report iostream_pump_flush.

Thanks for the cases! Excited to see MLTA actually misses some cases. I will go through the cases and get back to you when I find something interesting

for-just-we commented 5 months ago

Thanks for the description and comments

In your crix project. You seem to comment that such a case as pool->free = free_fn; // free_fn is a function pointer from a function argument should be deemed as escaped. Does this mean in the case we discuss, when a func pointer field of struct load value from a function pointer variable (could be a function parameter or a simple func pointer global variable), we should simply regard it as escaped instead of doing inter-procedural analysis? Although it might induce more false positives. In this case, it seems that no false positive will be induced. But it will lose precision on some simple stupid data-flow cases.

I think you are mentioning there are no false negatives but more false positives here. I believe MLTA does not involve expensive inter-procedural analysis as it can quickly scanned through the Linux kernel. Involving inter-procedural analysis might increase the overhead.

I can double-check the code to make sure whether there are inter-procedural analyses in MLTA or not.

However, I notice in all 3 cases MLTA fails to report iostream_pump_flush.

Thanks for the cases! Excited to see MLTA actually misses some cases. I will go through the cases and get back to you when I find something interesting

I guess the code is in MLTA::typeConfineInFunction(Function *F), where my understanding for this part of code is that they traversing all arguments of a callInst, if encounter a function being used as a argument. Try to deep into the callee to continue type confining. Well, maybe there are some mistakes in my understanding.

else if (CallInst *CI = dyn_cast<CallInst>(I)) {
        for (User::op_iterator OI = I->op_begin(), 
                OE = I->op_end();
                OI != OE; ++OI) {
            if (Function *F = dyn_cast<Function>(*OI)) {
                if (F->isIntrinsic())
                    continue;
                if (CI->isIndirectCall()) {
                    confineTargetFunction(*OI, F);
                    continue;
                }
                Value *CV = CI->getCalledOperand();
                Function *CF = dyn_cast<Function>(CV);
                if (!CF)
                    continue;
                if (CF->isDeclaration())
                    CF = Ctx->GlobalFuncMap[CF->getGUID()];
                if (!CF)
                    continue;
                if (Argument *Arg = getParamByArgNo(CF, OI->getOperandNo())) {
                    for (auto U : Arg->users()) {
                        if (isa<StoreInst>(U) || isa<BitCastOperator>(U)) {
                            confineTargetFunction(U, F);
                        }
                    }
                }
                // TODO: track into the callee to avoid marking the
                // function type as a cap
            }
        }
    }
for-just-we commented 2 months ago

Hi, author. My recent work requires the use of MLta. I have been reading the code of MLTA and making some adjustments, but there are some parts of the code that I do not fully understand. I would like to seek your guidance on these.

1.Question 1

The first place is in MLTA::confineTargetFunction, where you mark a function-type as escaping. However, If I use fuzz type match by comparing the call arguments and function parameters one by one. Do I need to mark the escaped function-type? I give an example

typedef void (*fptr_long)(long);
typedef void (*fptr_int)(int);
typedef void (*fptr_ptr)(fptr_long);
void f1(long a) {}
void f2(long a) {}
void f3(long a) {}

void scene1_b(fptr_int f) { f(0); } // call1

void scene1_a () {
   fptr_int f = (fptr_int)&f1;
   scene1_b(f);
}

struct S {
    fptr_long one;
    fptr_int two;
};

void scene2_b (struct S* s) { s->one(0); } // call2

void scene2_a () {
    struct S s = {&f2 , 0};
    scene2_b(&s);
}

fptr_long callback;

void set_callback(fptr_long f) { callback = f; }

void scene3_b () { callback(0); } // call

void scene3_a () {
    fptr_ptr some_cb_target = &set_callback ;
    some_cb_target(&f3); // call3
}

int main(){
    scene1_a();
    scene2_a();
    scene3_a();
    return 0;
}

Where in this case, s->one(0); should be analyzed by MLTA with callee set of only f2. However, due to typedef void (*fptr_long)(long) escaped. It is analyzed by FLTA. Here I don't understand the necessity of marking escaped function-type.

2.Question 2

The following code in MLTA::typeConfineInFunction(Function *F).

else if (CallInst *CI = dyn_cast<CallInst>(I)) {
            for (User::op_iterator OI = I->op_begin(), 
                    OE = I->op_end();
                    OI != OE; ++OI) {
                if (Function *F = dyn_cast<Function>(*OI)) {
                    if (F->isIntrinsic())
                        continue;
                    if (CI->isIndirectCall()) {
                        confineTargetFunction(*OI, F);
                        continue;
                    }
                    Value *CV = CI->getCalledOperand();
                    Function *CF = dyn_cast<Function>(CV);
                    if (!CF)
                        continue;
                    if (CF->isDeclaration())
                        CF = Ctx->GlobalFuncMap[CF->getGUID()];
                    if (!CF)
                        continue;
                    if (Argument *Arg = getParamByArgNo(CF, OI->getOperandNo())) {
                        for (auto U : Arg->users()) {
                            if (isa<StoreInst>(U) || isa<BitCastOperator>(U)) {
                                confineTargetFunction(U, F);
                            }
                        }
                    }
                    // TODO: track into the callee to avoid marking the
                    // function type as a cap
                }
            }
        }

I understand the process for StoreInst. But why should MLTA process CallInst in typeConfineInFunction. Because I don't see there are confinement to be analyzed in call expression.