jermp / pthash

Fast and compact minimal perfect hash functions in C++.
MIT License
190 stars 25 forks source link

Is there a map implementation of PTHash? #6

Closed songqing closed 1 year ago

songqing commented 1 year ago

PTHash has great performance and is headers only, so it's easy to use, however, it only includes the hash function now, is there a map implementation based on PTHash? For example, a map with put() and get(). Thank you.

roberto-trani commented 1 year ago

Hi @songqing , to build a static map on top of PTHash you just need to displace the values in an array based on the "positions" that PTHash associates to the keys. However, it seems you are looking for a dynamic data structure where you dynamically add entries (put), which is out of the scope of our work.

songqing commented 1 year ago

Hi, in fact, I are looking for a static map, which contains many key-value pairs instead of keys only. It may not have the put(), but it will have a get(), for example, value = get(key). It seems PTHash only provide a function f(), index = f(key), index is in [0..n], but can not get the value of the map's key.

jermp commented 1 year ago

Hi @songqing, as Roberto suggested: if you have a collection of n (key, value) pairs, you can use a PTHash mphf f plus an array of values V[1..n] to obtain a static map. Suppose you have a pair (key, value) and that PTHash gives you i = f(key). Then you store value at position i in the array V, i.e., V[i]=value.

Hope this helps.

songqing commented 1 year ago

OK, I see, I know the way to implement a map with an array just like you said, as it's a common need, do you have any plan to implement a map?

songqing commented 1 year ago

For example, a simplified implementation like this https://github.com/facebook/rocksdb/tree/main/table/cuckoo. I think more people use use it more easily if we have a map like that.

jermp commented 1 year ago

OK, I see, I know the way to implement a map with an array just like you said, as it's a common need, do you have any plan to implement a map?

It’s straight forward to implement it as we said: just displace the values in the order given by f.

So I don’t understand your question. Do you want us to implement it?

songqing commented 1 year ago

OK, I think a static map based on PTHash is a common problem, if we have a map like this, people will not need to implement it by themselves repetitively, and PTHash can be used more easily. But just as you said, it's easy to implement, I'll have a try, thank you.

jermp commented 1 year ago

Ok great! Feel free to submit a pull request then with a .cpp example using the map. For the map itself, I would define a class map as follows: template<typename MPHF, typename ValueType> map with two main methods, build(config, n, keys.begin(), values.begin()) and lookup(key), that build and return the value associated to a given key respectively. May this serve you as a guide.

Let us know! Best, -Giulio