joxeankoret / pigaios

A tool for matching and diffing source codes directly against binaries.
GNU General Public License v3.0
634 stars 67 forks source link

Inlined function leads to wrong matches with high confidence #30

Open fmagin opened 5 years ago

fmagin commented 5 years ago

Quickly dumping this here so it is documented, maybe I can do a PR for this at some point. First though, thanks for this tool, has been quite useful a few times already!

I am running into the issue that functions like e.g. SQLite's sqliteAuthBadReturnCode:

static void sqliteAuthBadReturnCode(Parse *pParse){
    sqlite3ErrorMsg(pParse, "authorizer malfunction");
    pParse->rc = SQLITE_ERROR;
}

might get inlined which confuses the matching algorithm. It reports a 1.0 confidence match with the calling function sqlite3AuthCheck due to similar rare constant "authorizer malfunction"

int sqlite3AuthCheck(...)

[some code snipped]
  if( rc==SQLITE_DENY ){
    sqlite3ErrorMsg(pParse, "not authorized");
    pParse->rc = SQLITE_AUTH;
  }else if( rc!=SQLITE_OK && rc!=SQLITE_IGNORE ){
    rc = SQLITE_DENY;
    sqliteAuthBadReturnCode(pParse);
  }
  return rc;
}

In this case the correct match is pretty obvious because the compiled function also references the string "not authorized" and is significantly longer. If you need a test case to reproduce this I might be able to create one, I can't share this specific binary publicly.

My suggestion would be: Some heuristic that compares the function length to reduce the confidence of a really short source function being matched to a longer one simply due to one string match. Maybe some warning that this might be inlining (which is still valuable information and could be used for the heuristics that match based on callers/callees later too).

joxeankoret commented 5 years ago

Thank you very much for reporting it! I've some ideas for making it less error prone in that sense, but I haven't coded them yet. This is a good reminder that I have to do ASAP.

So, for now, there is one hidden option that "might" work for you: add to your sbd.project in the [GENERAL] section the following: inlines=1. Then, re-export and try again. Please note that export time will considerably increase.