radareorg / radare2

UNIX-like reverse engineering framework and command-line toolset
https://www.radare.org/
GNU Lesser General Public License v3.0
20.59k stars 2.99k forks source link

RHashTable doesn't handle collisons #5168

Closed crowell closed 7 years ago

crowell commented 8 years ago

https://github.com/radare/radare2-regressions/blob/master/unit/test_hashtable.c#L59

should inserts fail with a collision? how should they be handled, because the key is just hash, not an actual object, handling collisions in this way doesn't seem possible?

radare commented 8 years ago

Usually insert will never fail by definition unless the hashtable is full or cant allocate memory. The add operation, is the one that will fail in thatcase. This is how the sdb api and many other hashtable implementations work too. I think its a known thing and i try to check if the value exists before setting it.

But i agree that this behavious must be different to avoid confussions or common problems

On 18 Jun 2016, at 01:15, Jeffrey Crowell notifications@github.com wrote:

https://github.com/radare/radare2-regressions/blob/master/unit/test_hashtable.c#L59

should inserts fail with a collision? how should they be handled, because the key is just hash, not an actual object, handling collisions in this way doesn't seem possible?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

crowell commented 8 years ago

well, the fact of the matter is that djb2 isn't a fantastic hashing function for strings.

printf ("%p\n",r_str_hash ("cb"));
printf ("%p\n",r_str_hash ("bC"));
0x540d64
0x540d64

provides a collision, it's super easy to find collisions. colliding keys will break sdb, and the more we rely on sdb, the worse it is.

Perhaps, key, hash, value should all be stored? This will take up more space but will allow for actual collision resolution.

radare commented 8 years ago

nice catch. but %p is the wrong format string to use here.

this algorithm comes from java hash string algorithm, we can switch to another one that have other collisions. but we will always find collisions

On 19 Jun 2016, at 17:08, Jeffrey Crowell notifications@github.com wrote:

well, the fact of the matter is that djb2 isn't a fantastic hashing function for strings.

printf("%p\n",r_str_hash ("cb")); printf("%p\n",r_str_hash ("bC")); provides a collision, it's super easy to find collisions. colliding keys will break sdb, and the more we rely on sdb, the worse it is.

Perhaps, key, hash, value should all be stored? This will take up more space but will allow for actual collision resolution.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/radare/radare2/issues/5168#issuecomment-227002237, or mute the thread https://github.com/notifications/unsubscribe/AA3-loZ9R3tXDg-fIgnXtMylObNDX-8zks5qNVuBgaJpZM4I4zSO.

crowell commented 8 years ago

related: https://github.com/radare/sdb/pull/102

radare commented 8 years ago

will be nice to have this fixed before 0.10.4. thanks for the effort :)

Maijin commented 7 years ago

Can be closed ?

alvarofe commented 7 years ago

afaik @radare is working on this in the sdb branch, cause the collision is not longer an issue but others have arised

radare commented 7 years ago

Nope dont close it

On 3 Nov 2016, at 00:03, Álvaro Felipe Melchor notifications@github.com wrote:

afaik @radare is working on this in the sdb branch, cause the collision is not longer an issue but others have arised

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

radare commented 7 years ago

im afraid of merging this before having the update-sdb branch merged. so moving to 1.1

radare commented 7 years ago

Now that the sdb branch is merged this issue should be "gone" as soon as we replace the raw usage of util/ht.c to use the new one. but this may introduce new regressions.

radare commented 7 years ago

util/ht is still used in some places, we should use sdb's one

alvarofe commented 7 years ago

we have already issues for this