weggli-rs / weggli

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.
Apache License 2.0
2.32k stars 127 forks source link

sizeof subexpression not matching #58

Closed Roguebantha closed 2 years ago

Roguebantha commented 2 years ago

In the circumstance that a query containing a subexpression is surrounded by two adjacent expressions, it appears that any sizeof subexpression matches get dropped. For example, for the following code:

int test_sizeof() {
    int b;
    a = sizeof(a) + b;
}
int test_deref() {
    int b;
    void* a = *a + b;
}
int test_call(void* c) {
    int b;
    void* a = test_call(a) + b;
}

The following behavior is observed:

$ weggli '{ _(a) + _; }' test.c
./test.c:1
int test_sizeof() {
    int b;
    a = sizeof(a) + b;
}
./test.c:5
int test_deref() {
    int b;
    void* a = *a + b;
}
./test.c:9
int test_call(void* c) {
    int b;
    void* a = test_call(a) + b;
}
$ weggli '{ _ = _(a); }' test.c
./test.c:1
int test_sizeof() {
    int b;
    a = sizeof(a) + b;
}
./test.c:5
int test_deref() {
    int b;
    void* a = *a + b;
}
./test.c:9
int test_call(void* c) {
    int b;
    void* a = test_call(a) + b;
}
$ weggli '{ _ = _(a) + _; }' test.c
./test.c:5
int test_deref() {
    int b;
    void* a = *a + b;
}
./test.c:9
int test_call(void* c) {
    int b;
    void* a = test_call(a) + b;
}

It's this final query that appears to return incorrect results. I would have expected the sizeof function to have been included in the query results, but it is not. It is only included if there are less than 2 adjacent expressions in the query itself.

felixwilhelm commented 2 years ago

Thanks for the bug report and sorry for the slow response.

This looks like an issue in the underlying tree-sitter grammar (see https://github.com/tree-sitter/tree-sitter-c/issues/51). I have an idea for a fix, I'll keep you posted.

felixwilhelm commented 2 years ago

This should now be fixed in master. Thanks again for the report :)