radareorg / sdb

Simple and fast string based key-value database with support for arrays and json
https://www.radare.org/
MIT License
218 stars 62 forks source link

Evaluate the use of warp's hash string algorithm (in D) #94

Open radare opened 8 years ago

radare commented 8 years ago

original code from https://github.com/facebookarchive/warp

    static size_t getHash(const(uchar)[] name)
    {
        size_t hash = 0;
        while (name.length >= size_t.sizeof)
        {
            hash = hash * 37 + *cast(size_t*)name.ptr;
            name = name[size_t.sizeof .. $];
        }
        foreach (c; name)
            hash = hash * 37 + c;
        hash ^= (hash >> 20) ^ (hash >> 12);
        return hash ^ (hash >> 7) ^ (hash >> 4);
    }

in C would be something like:

size_t getHash(const char *str) {
  size_t hash = 0, ptr = str;
  while(strlen(ptr) >= sizeof(size_t)) {
    hash = hash * 37 + *ptr;
    ptr++;
  }
  for (str = ptr; *str; str++) {
    hash = hash * 37 + *str;
  }
  hash ^= (hash >> 20) ^ (hash >> 12);
  return hash ^ (hash >> 7) ^ (hash>> 4);
}

the strlen can be optimized by checking if any of the Nth chars is null , where N is constant for sizeof (return_type), in theory this hash have less collisions, but right now we have no reproducible collisions in the current string hashing algorithm used.