weggli-rs / weggli

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.
Apache License 2.0
2.34k stars 130 forks source link

invalid source/capture range when used with python API #22

Closed sduverger closed 2 years ago

sduverger commented 2 years ago

Hello,

Thank you for this tool. I'm experiencing a different behaviour whether I'm using weggli on command line or through python API.

% cat test.pat
do {
  $buf[_+_]=_;
} while(_);
% cat test.c
void loop (char *b){
    int i = 0;
    do {
        b[i+i] = 0;
        i += 1;
    } while (i < 10);
}
% weggli "$(<test.pat)" test.c
[...]/test.c:1
void loop (char *b){
    int i = 0;
    do {
        b[i+i] = 0;
        i += 1;
    } while (i < 10);
}

Works fine, but with python API:

% cat test.py
import weggli
qry=open("test.pat").read()
src=open("test.c").read()
pq=weggli.parse_query(qry)
for m in weggli.matches(pq, src):
    weggli.display(m, src)
% python3 test.py
  for m in weggli.matches(pq, src):
thread '<unnamed>' panicked at 'begin <= end (34 <= 33)' when slicing 'void loop (char *b){'
[...]
, src/result.rs:79:20
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Traceback (most recent call last):
  File "[...]/test.py", line 6, in <module>
    weggli.display(m, src)
pyo3_runtime.PanicException: begin <= end (34 <= 33) when slicing 'void loop (char *b){'
[...]

Little work-around, but not an acceptable PR :)

diff --git a/src/result.rs b/src/result.rs
index 00997a4..77f0d8f 100644
--- a/src/result.rs
+++ b/src/result.rs
@@ -73,7 +73,10 @@ impl<'a, 'b> QueryResult {

         if self.captures.len() > 1 {
             // Ensure we don't overlap with the range of the next node.
-            header_end = cmp::min(header_end, self.captures[1].range.start - 1);
+            let next = self.captures[1].range.start - 1;
+            if next > self.function.start {
+                header_end = cmp::min(header_end, next);
+           }
         }

         result += &source[self.function.start..header_end];
felixwilhelm commented 2 years ago

Thanks for the detailed bug report! I've pushed a temporary fix in 0b29e59 to stop the panic from happening, but this is a symptom of some larger issues in the result display code. I plan to refactor the whole area in the next weeks to improve printing and add support for enhancements like line numbers so you might want to keep an eye on the repo.

Please note that I'm not really using/supporting the Python API right now so it's not feature-complete. If it works for you that's great but I'm not putting a large focus on it at the moment.

sduverger commented 2 years ago

Thanks @felixwilhelm. It works for me.