weggli-rs / weggli

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.
Apache License 2.0
2.34k stars 130 forks source link

Handling of the >= operator is broken in specific cases #60

Open nickcano opened 2 years ago

nickcano commented 2 years ago

In certain cases, weggli seems to parse the >= operator incorrectly. While I haven't debugged the code to confirm, I suspect weggli is mistakenly parsing template parameter statements where they don't exist, thus swallowing the > and treating the = as a stand-alone assignment operator.

Consider the following query:

weggli --cpp '{
    if (_ = _)
    { }
}' $TARGET

This is meant to detect non-declaration assignments in if statements. This query matches the following snippets, as expected:

void test_bad_conditional() {
    if (x = 1234) {

    }
}
void test_bad_conditional() {
    if (x =! false) {

    }
}

And, as expected, it does not match this snippet:

void test_bad_conditional() {
    if (auto x = 1234) {

    }
}

Something I did not expect, however, is for it to match these snippets (it does):

bool test_bad_conditional() {
    if (a.index < 0 || b->index >= values.size()) {
        return false;
    }
    return true;
}
bool test_bad_conditional() {
    if (b.x < 3 && c >= 5) {
        return false;
    }
    return true;
}
bool test_bad_conditional() {
    if (head->packets < 1 || head->offset >= offset) {
        return false;
    }
    return true;
}

I've run this query against various large code-bases, and the only false matches contain a < preceding the >, hence my suspicion that weggli is mistakenly parsing a template param. This isn't the only prerequisite, however, as this snippet does not match:

bool test_bad_conditional() {
    if (b < 4 || a >= 3) {
        return false;
    }
    return true;
}

Making me think that, in addition to the preceding <, the presence of the . or -> (and potentially other) operators on either side is important.

felixwilhelm commented 2 years ago

Nice catch. Thanks for the detailed bug report.

This seems to be an issue in the tree-sitter CPP grammar.
b.x < 3 && c >= 5 is interpreted as an assignment expression involving a template argument.

    assignment_expression [0, 0] - [0, 17]
      left: field_expression [0, 0] - [0, 14]
        argument: identifier [0, 0] - [0, 1]
        field: template_method [0, 2] - [0, 14]
          name: field_identifier [0, 2] - [0, 3]
          arguments: template_argument_list [0, 4] - [0, 14]
            binary_expression [0, 6] - [0, 12]
              left: number_literal [0, 6] - [0, 7]
              right: identifier [0, 11] - [0, 12]
      right: number_literal [0, 16] - [0, 17]

I'll try to fix it :)