weggli-rs / weggli

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.
Apache License 2.0
2.34k stars 130 forks source link

Tree sitter query generation failed: Searching for multiple comparators fails #65

Open iankronquist opened 2 years ago

iankronquist commented 2 years ago

I got an error where tree sitter query generation failed with weggli 0.2.4, and the cli kindly informed me this was a bug.

Let me explain what I wanted to accomplish, what my query was, and the output.

What I Wanted to Find, For Extra Context

I want to find anywhere an enum value whose name ends with '_COUNT' is compared against some variable in order to find places an attacker can supply a negative enum value and dodge a bounds check.

Here is some example vulnerable code:

enum OptionType {
    Option_A,
    Option_B,
    Option_COUNT
};

bool OPTIONS[Option_Count];
void set_option(enum OptionType option_type_attacker_controlled, bool set_to) {
    // If option_type_attacker_controlled is negative this check will pass leading to an oob write
    if (option_type_attacker_controlled >= Option_COUNT) { abort(); } 
    OPTIONS[option_type_attacker_controlled] = set_to;
} 

The Buggy Query

Here is a minimal reproduction of a weggli query I came up with:

% weggli --cpp -u -R '$counted=\w*_COUNT' -C '{($var >= $counted); OR: ($var > $counted); }' ./
Tree sitter query generation failed: Structure
                 (binary_expression left: [(identifier) (field_expression) (field_identifier) (qualified_identifier) (this)] @1 operator: "<=" right: [(identifier) (field_expression) (field_identifier) (qualified_identifier) (this)] @0)]) )((labeled_statement label:(statement_identifier) (parenthesized_expression [(binary_expression left: [(identifier) (field_expression) (field_identifier) (qualified_identifier) (this)] @2 operator: ">" right: [(identifier) (field_expression) (field_identifier) (qualified_identifier) (this)] @3)
                                                                                                                                                                                                                                                                                                ^
sexpr: ((parenthesized_expression [(binary_expression left: [(identifier) (field_expression) (field_identifier) (qualified_identifier) (this)] @0 operator: ">=" right: [(identifier) (field_expression) (field_identifier) (qualified_identifier) (this)] @1)
                (binary_expression left: [(identifier) (field_expression) (field_identifier) (qualified_identifier) (this)] @1 operator: "<=" right: [(identifier) (field_expression) (field_identifier) (qualified_identifier) (this)] @0)]) )((labeled_statement label:(statement_identifier) (parenthesized_expression [(binary_expression left: [(identifier) (field_expression) (field_identifier) (qualified_identifier) (this)] @2 operator: ">" right: [(identifier) (field_expression) (field_identifier) (qualified_identifier) (this)] @3)
                (binary_expression left: [(identifier) (field_expression) (field_identifier) (qualified_identifier) (this)] @3 operator: "<" right: [(identifier) (field_expression) (field_identifier) (qualified_identifier) (this)] @2)])) )
This is a bug! Can't recover :/

The Actual Query

Here is the full query I wanted: weggli --cpp -u -R '$counted=\w*_COUNT' -C '{($var >= $counted); OR: ($var > $counted); OR: ($var < $counted); OR: ($var <= $counted); }' ./