weggli-rs / weggli

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.
Apache License 2.0
2.32k stars 127 forks source link

Crash when parsing query #61

Open pgoodman opened 2 years ago

pgoodman commented 2 years ago

I am not able to reproduce this, but I got this with something vaguely like $var = $func($arg1,. I have also noticed issues where $v = $f() doesn't work, $v = $fu() doesn't work, but $v = $fun() does work, and $v = $f($p) works. It's very strange.

My use case is that I an interactively trying to build a Weggli query on each keystroke.

Tree sitter query generation failed: Structure
 ([(assignment_expression left: [(identifier) (field_expression) (field_identifier)] @0 right: [(cast_expression value: (call_expression function:[(identifier) (field_expression) (field_identifier)] @1 arguments:(argument_list . [(identifier) (field_expression) (field_identifier)] @2 . (ERROR)))) (call_expression function:[(identifier) (field_expression) (field_identifier)] @1 arguments:(argument_list . [(identifier) (field_expression) (field_identifier)] @2 . (ERROR)))])
                                                                                                                                                                                                                                                                                              ^
sexpr: ([(assignment_expression left: [(identifier) (field_expression) (field_identifier)] @0 right: [(cast_expression value: (call_expression function:[(identifier) (field_expression) (field_identifier)] @1 arguments:(argument_list . [(identifier) (field_expression) (field_identifier)] @2 . (ERROR)))) (call_expression function:[(identifier) (field_expression) (field_identifier)] @1 arguments:(argument_list . [(identifier) (field_expression) (field_identifier)] @2 . (ERROR)))])
                        (init_declarator declarator: [(identifier) (field_expression) (field_identifier)] @0 value: [(cast_expression value: (call_expression function:[(identifier) (field_expression) (field_identifier)] @1 arguments:(argument_list . [(identifier) (field_expression) (field_identifier)] @2 . (ERROR)))) (call_expression function:[(identifier) (field_expression) (field_identifier)] @1 arguments:(argument_list . [(identifier) (field_expression) (field_identifier)] @2 . (ERROR)))]) 
                        (init_declarator declarator:(pointer_declarator declarator: [(identifier) (field_expression) (field_identifier)] @0) value: [(cast_expression value: (call_expression function:[(identifier) (field_expression) (field_identifier)] @1 arguments:(argument_list . [(identifier) (field_expression) (field_identifier)] @2 . (ERROR)))) (call_expression function:[(identifier) (field_expression) (field_identifier)] @1 arguments:(argument_list . [(identifier) (field_expression) (field_identifier)] @2 . (ERROR)))])]@3)
This is a bug! Can't recover :/
pgoodman commented 2 years ago

Here's another one:

Tree sitter query generation failed: Structure
                 (binary_expression left: [(identifier) (field_expression) (field_identifier)] @2 operator: "*" right: [(identifier) (field_expression) (field_identifier)] @1)]))]) consequence:(identifier) @4)@5(#eq? @4 "g"))
                                                                                                                                                                                    ^
sexpr: ((if_statement "if" @0 condition:(parenthesized_expression [(binary_expression left: (parenthesized_expression [(binary_expression left: [(identifier) (field_expression) (field_identifier)] @1 operator: "*" right: [(identifier) (field_expression) (field_identifier)] @2)
                (binary_expression left: [(identifier) (field_expression) (field_identifier)] @2 operator: "*" right: [(identifier) (field_expression) (field_identifier)] @1)]) operator: ">" right: [(identifier) (field_expression) (field_identifier)] @3)
                (binary_expression left: [(identifier) (field_expression) (field_identifier)] @3 operator: "<" right: (parenthesized_expression [(binary_expression left: [(identifier) (field_expression) (field_identifier)] @1 operator: "*" right: [(identifier) (field_expression) (field_identifier)] @2)
                (binary_expression left: [(identifier) (field_expression) (field_identifier)] @2 operator: "*" right: [(identifier) (field_expression) (field_identifier)] @1)]))]) consequence:(identifier) @4)@5(#eq? @4 "g"))
This is a bug! Can't recover :/
pgoodman commented 2 years ago

And another:

Tree sitter query generation failed: Structure
 ((if_statement "if" @0 condition:(parenthesized_expression (ERROR (ERROR)) [(identifier) (field_expression) (field_identifier)] @1) consequence:(goto_statement "goto" @2 label:(statement_identifier)))@3)
                                                           ^
sexpr: ((if_statement "if" @0 condition:(parenthesized_expression (ERROR (ERROR)) [(identifier) (field_expression) (field_identifier)] @1) consequence:(goto_statement "goto" @2 label:(statement_identifier)))@3)
This is a bug! Can't recover :/
felixwilhelm commented 2 years ago

Thanks for the bug report. I'm not entirely sure how you are able to create these crashes, at least I can't reproduce any of these using the CLI interface.Could you add some debug code to build_query_tree to dump the input string that triggers these crashes?

Still, I think the panic call in https://github.com/googleprojectzero/weggli/blob/01499e238c9b8e4af514dea1ff637c8e1846db18/src/lib.rs#L59 is just sloppy coding. That method should return an error message instead of crashing, I'll try to find some time to fix this in the coming days.

pgoodman commented 1 year ago

Here was a recent error we got:

Tree sitter query generation failed: Structure
 ((ERROR (identifier) @0 (binary_expression left:(identifier) @1 operator:"-" (ERROR (identifier) @2) right:[(assignment_expression left: (identifier) @3 right: [(cast_expression value: (identifier) @4) (identifier) @4])
                                                                             ^
sexpr: ((ERROR (identifier) @0 (binary_expression left:(identifier) @1 operator:"-" (ERROR (identifier) @2) right:[(assignment_expression left: (identifier) @3 right: [(cast_expression value: (identifier) @4) (identifier) @4])
                        (init_declarator declarator: (identifier) @3 value: [(cast_expression value: (identifier) @4) (identifier) @4])
                        (init_declarator declarator:(pointer_declarator declarator: (identifier) @3) value: [(cast_expression value: (identifier) @4) (identifier) @4])]))@5(#eq? @0 "ind")(#eq? @1 "find")(#eq? @2 "R")(#eq? @3 "end")(#eq? @4 "end"))
This is a bug! Can't recover :/

I think this is against a very recent commit. I'm petty sure we're now tracking the version that can return null. We're using weggli via weggli-native.