weggli-rs / weggli

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.
Apache License 2.0
2.34k stars 130 forks source link

Question - query construction #21

Closed irwincong closed 2 years ago

irwincong commented 2 years ago

Kicking the tires, so to speak, to see how it compares to the tools listed in the README. I skimmed through https://github.com/googleprojectzero/weggli/blob/main/tests/query.rs to see if there was a similar query that I could build off-of, but I didn't see it. So, I am wondering if I am encountering a current known limitation.

Context: I have a synthetic bug that I can't share without first asking for permission. The bug is a stack buffer overwrite. The stack variable is declared early in the call stack but the bug is triggered at least two stack frames down the call stack. The bug is an improper length parameter used in a snprintf call.

Based on the examples that I could dig up, here's what I am using:

# this doesn't return anything
'_ $fn(_, $buf, $limit, _) { snprintf($buf, _, _); }'

# returns too many things, and I really want to have the stack variable context
'_ $fn(_, $buf, $limit, _) { snprintf(_, _, _); }'

# trying with a single caller and a callee (but in reality I need to go more than one layer deep)
'{
_ $fn(_, $buf, $limit, _) { snprintf(_, _, _); }

{
_ $buf[_];
$fn(_. $buf, _);
}
}'

Question:

  1. Does weggli support querying across multiple functions or is the AST pattern matching limited (e.g., to a single source file or a single function)?
  2. Is my caller-callee query on the right track? If so, how do I extend the query to an arbitrary depth of different functions?
felixwilhelm commented 2 years ago

Hi Irwin,

thanks for trying out weggli and sorry for the slow response I was OOO.

'_ $fn(_, $buf, $limit, _) { snprintf($buf, _, _); }' doesn't work because of missing variable types, something like '_ $fn(char * $buf, $type2 $limit) {snprintf($buf, _, _);}' should give you results.

  1. You can query across multiple functions by specifying multiple patterns with the same variable names: weggli '_ $fn(char * $buf, $type2 $limit) {snprintf($buf, _, _);}' -p '$fn(_);' /code gives you all the functions that match the snprintf pattern as well as their callers.
  2. If you split your caller-calle query into two patterns (and add the right types or placeholder vars) it will work. For super generic queries (e.g if your second example is just $fn(_)) it won't be very fast as weggli is not super smart about optimizing multi pattern calls yet.

weggli doesn't support arbitrary depth chains and I don't plan to add support in the future. If you need to find the bug with a one-shot query joern or CodeQL are probably the way to go. weggli is more targeted at finding interesting sinks (like the snprintf call taking an external buffer) and then supporting manual review (by finding callers that match some specific patterns).

Hope that helps, let me know if you have more questions.