weggli-rs / weggli

weggli is a fast and robust semantic search tool for C and C++ codebases. It is designed to help security researchers identify interesting functionality in large codebases.
Apache License 2.0
2.32k stars 127 forks source link

can't search for-statement which use class member variable as threshold #27

Closed y0ny0ns0n closed 2 years ago

y0ny0ns0n commented 2 years ago

I tried to use weggli on hexray C output and found some useful cases recently.

But I encounter a problem today and I think I need some help.

the problem is, I tried to find some function like below:

struct StructName *__fastcall ClassName::MethodName(ClassName *this, int a2)
{
  __int64 i; // r8
  struct StructName *result; // rax

  for ( i = 0i64; i < this->dword_80; ++i )
  {
    result = *(this->qword_88 + 8 * i);
    if ( a2 == *(result + 112) )
      return result;
  }
..
}

I could found above function with query below:

./target/release/weggli --cpp 'for(_; _ < this->dword_80; _) { _; }' ~/hexray_output.cpp

But, above query is using fixed member variable name so I couldn't exapand it to find another variant.

I tried following queries but none of them working as I want.

Queries with no-output

./target/release/weggli --cpp '_ $func(_* $thisptr) { for(_; _ < $thisptr->_; _) { _; } }' ~/hexray_output.cpp
./target/release/weggli --cpp '_ $func(_* $thisptr) { for(_; _ < $thisptr->dword_80; _) { _; } }' ~/hexray_output.cpp

./target/release/weggli -R 'thisptr=this|a1' --cpp '_ $func(_) { for(_; _ < $thisptr->dword_80; _) { _; } }' ~/hexray_output.cpp

./target/release/weggli --cpp '_ $func(_* $thisptr) { for(_; _ < _($thisptr)->_; _) { _; } }' ~/hexray_output.cpp

Queries with output

./target/release/weggli --cpp '_ $func(_* $thisptr) { for(_; _ < _($thisptr); _) { _; } }' ~/hexray_output.cpp

// ... if ( v8 >= 0 ) { if ( ((this + 932) + 112i64) ) { for ( i = 0; i < *(a2 + 10); ++i ) // <- here { // ...

int64 fastcall Func2( struct Struct2 *a1, ....) { __int64 v4; // rdi ...

v4 = (a1 + 5); v6 = a3; v7 = a2; updated = 0; for ( i = 0; i < (*(a1 + 3) + 6i64); ++i ) // <- here { ... }

only found `i < (a1 + ...)` , no output like `i < a1->dword_80`

./target/release/weggli -R 'thisptr=this|a1' --cpp ' $func() { for(; < ($thisptr); ) { _; } }' ~/hexray_output.cpp

This one found `i < a1->dword_80` and similar member variable reference. But as you can see, it just use regex match instead of using `this` argument of class method. I want to find data flow from `this` argument  to for-loop.

If you have any free time, I will really appreciate to let me know what I am doing wrong.

---

FYI, my host OS is macOS BigSur.

sw_vers ProductName: macOS ProductVersion: 11.4 BuildVersion: 20F71

felixwilhelm commented 2 years ago

Thanks for the detailed bug report. I'll take a look! :)

felixwilhelm commented 2 years ago

tree-sitter returns a special AST node for 'this' so variable matching did not work correctly. Commit 8e6d7a3 fixes the issue.

Thanks again for the report.

y0ny0ns0n commented 2 years ago

Thanks!