joxeankoret / diaphora

Diaphora, the most advanced Free and Open Source program diffing tool.
http://diaphora.re
GNU Affero General Public License v3.0
3.58k stars 371 forks source link

bytes_hash related fixes #250

Closed Amit-Oha closed 1 year ago

Amit-Oha commented 1 year ago

Hello, The git pull request is about fixing a problem with certain functions from a library not being recognized as matches even though they are almost identical. I have discovered that the logic of which of the instruction's bytes to preserve for "Hash Bytes" heuristic was incorrect, mainly due to an incorrect use of the offb member of the op_t class - which gives the offset of the operand in the instruction, not its size. In addition, I propose to make the "Bytes Hash" heuristic more accurate by using another method to recognize instructions that require special handling: rather than relying solely on the operand's type, we can use the CodeRefsFrom and DataRefsFrom functions to recognize the relevant instructions. Additionally, I suggests a different handling for a sanity check failure on the get_bytes call rather than skipping the entire iteration, as this call is only used for the "Bytes Hash" heuristic and the iteration for other heuristics can continue even if it fails.

The changes made in the pull request are:

  1. Correcting the use of the offb member
  2. Improving accuracy of bytes preservation in "Bytes Hash"
  3. Increasing accuracy of non-"Bytes Hash" heuristics even in the case of failures in get_bytes call.
joxeankoret commented 1 year ago

Hi! Thanks for the pull request, I will review it and launch the testing suite, see how it works with multiple different architectures and merge if everything goes ok. Thank you!

Amit-Oha commented 1 year ago

Thank you, looking forward for your feedback!