Closed alvarofe closed 4 years ago
Diaphora refuses duplicate results; instead, it assumes the first one is the best one. Usually, this model works "good enough", as the heuristics are launched from best to worst. However, it's probably time to change this behaviour to choose the best results. I agree.
Also, please could you share some binaries to test this specific issue? It's extremely hard to reproduce otherwise.
The fix for this bug is going to be published "soon". The fix is to, simply, move "Same constant" to the "Best" category instead of "Partial". However, I believe it's more a workaround for you specific problem than a real fix.
Fixed in version 2.0.
The target that I usually RE updates regularly and I use Diaphora to import functions names and such or study difference between two different releases - basically like everyone. However, I always noticed that some functions were quite obvious they were the same but Diaphora missed somewhow and I lost quite a time importing everything. Basically, although the function might have suffered some changes they contained the same debug strings such as.
Therefore, I was expecting to see them on "Partial matches" or "Best matches" with the description "Same constants" - after all the function in main and diff contain exactly the same constants.
In contrast, what I was seeing was that the function was matching by "Some rare constant" against another function because this one contained this.
The same constant for both functions were
__FILE__
. But it did not take into account that there is another function that contains the same constants and even with better ratio 0.6 vs 0.34 in my context.The issue resides in
add_matches_from_query_ratio_max
- although it would apply to anyadd_matches_from_*
. The logic of these functions is to run SQL queries (in a thread) and calculate the ratio and save the result. To avoid repeat the same function over and over, adding multiple items, it uses the setsmatched1, matched2
. Therefore, the query "Some rare constant" might run before that "Same constant" adding a match but on the other hand it would forbid that way a better match from "Same constant".In conclusion on my target the function
blabla
was matched againstlala
with a ratio of 0.34 using "Same rare constant", but using "Same constant" it finds a match ofblabla
againstjijij
with a ratio of 0.6 but due toblabla
was found in matched1 it was discarded.Is this a known issue or accepted compromise? It would require to cache the results and pick up the best ratio but that will slow down everything. Likewise, it might help to prioritize threads and assign each heuristic to a queue that should be run before others since "Same constant" is better than "Same rare constant".