googleprojectzero / functionsimsearch

Some C++ example code to demonstrate how to perform code similarity searches using SimHashing.
Apache License 2.0
559 stars 97 forks source link

Many of the tests are brittle when adding new features to the SimHash #10

Open thomasdullien opened 5 years ago

thomasdullien commented 5 years ago

This was a desired feature initially to make sure the underlying disassembly is good, but makes tweaking / improving / adding new features difficult without breaking the tests.

Particular culprits (and currently broken) are: flowgraphwithinstructions.creategraph flowgraphwithinstructions.parsejson functionsimhash.zero_weight_for_mnemonics

It would be great if we could come up with a way that does not break quite so easily ...

thomasdullien commented 5 years ago

At least the first two may be doable by switching to a string comparison on the JSON serialized form?