Closed ozgrakkurt closed 10 months ago
Hello, and thank you for the kind words!
A Rust porting is something I have consider writing but I first have to learn Rust, so it would not appear soon unless someone else is going to do it.
We can work though on a C api, yes, assuming you only need the signature of the functions to be in pure C but the function itself will then call the "read" C++ function as written in PTHash.
Best, -Giulio
makes sense, trying to add c header file. Also want to write rust port
Thank you! There are several people (including me, of course) who are interested in porting PTHash to Rust. I can connect you with them if interested and work together.
Best, -Giulio
The construction algorithm is actually pretty simple. I have recently rewrote it from scratch for my (abandoned) project. Instead of C API it's better to just write a whole algorithm in Rust. I've made this for Apache Arrow C++ and found it is simple enough to fit into a few steps:
1) Hash key columns into 2 vectors of size N: hash1(k)
, hash2(k)
;
2) Compute a load factor for each bucket: count[skew_bucketer(hash1_i)]++;
;
3) Group buckets by descending count, i.e. make a nested list of indices [[b1,b2,b3], [b4], ... [bm]]
where count[b1]=count[b2]
and so on (you can encode a nested list as 2 flat vectors);
4) Reorder hash2
vector according to bucket bins: [[hash2 elements for b1], [hash2 elements for b2], ..., [hash2 elements for bm]]
. This is another List[List[UInt64]]
that could be encoded as 2 flat vectors;
5) Now iterate that bins and pick a pilot value for each of them. That's it.
@snizovtsev I’m glad you found PTHash useful and easy to implement! I would be grateful if you could point out for what application you used PTHash. Thanks! -Giulio
Hello!
Very nice project and research (I don't understand fully but seems very interesting), I wanted to port this to Rust but seems like it will be a lot of work.
I though might be easier if it had a c api so I can call it with ffi from rust.
Is this possible to do?
I can work on it if you give me some direction.