trendmicro / tlsh

Other
726 stars 135 forks source link

Bug Fix+Portability: Improve portability by integral division #136

Open a4lg opened 5 months ago

a4lg commented 5 months ago

This is a continuation after the commit 755f31e75a59 ("3.12.0") which removed the dependency to the float log.

Once we find quartile values in TLSH, we perform a float division to get q1/q3 and q2/q3 in percentage (and taken the mod 16) but lacks portability (and has a bug which the committer explains later).

Note: q1 <= q2 <= q3.

If we have a 32-bit divider, a 64-bit / 32-bit division is not that a big cost (it's theoretically one instruction in x86 because the DIV instruction performs a 64-bit / 32-bit division, not 32-bit / 32-bit).

It also fixes the problem of arithmetic overflow calculating unsigned 32-bit (q1*100) and (q2*100) (if they overflow, the ratio will be an unexpected value).

Note: (q1*100) and (q2*100) in u32 is clearly a bug (in a normal sense) but I'm not sure whether you consider this a "specification" (as a developer of ssdeep, a fuzzy hash implementation, I acknowledge that we should be very aware of the compatibility). If this is considered a bug, I consider this is a good chance to fix this issue along with improving the portability / reproducibility.

Condition (this change may slightly change the hash value): q2 >= 167773 (or: q2 > (1 << 24) / 100) (due to the rough "percentage" metric, it's unlikely to change the hash in practice but I found (q1 or q2, q3) pairs that would change the hash value)

Condition (the arithmetic overflow will occur): q2 >= 42949673 (or: q2 > 0xffffffff / 100)