Closed progval closed 2 months ago
Also this one is not actually needed...
PS. Are you using PTHash for some of your projects? I'm curious.
I'm working with @vigna on porting https://docs.softwareheritage.org/devel/swh-graph/ to Rust and we're considering switching from GOV and GOV3 to PTHash. So right now I'm writing Rust bindings for PTHash here: https://gitlab.softwareheritage.org/swh/devel/pthash-rs
The reason I'm interested in build_from_hashes
is that I effectively can't call templated C++ functions from Rust, except through a function with a concrete type. This prevents me from giving users of the Rust crate the ability to pass me arbitrary key iterators I could pass directly to internal_memory_builder_partitioned_phf::build_from_keys
.
So instead I build an array of hashes (which is usually smaller than an array of keys, so it's okay-ish to keep it in RAM) and pass it to to a concrete-ized internal_memory_builder_partitioned_phf::build_from_hashes
, like this one: https://gitlab.softwareheritage.org/swh/devel/pthash-rs/-/blob/b28d91baf80a15998e8b10ecda41e1b7d716d26e/src/backends.rs#L57-62
I might add a few more concrete-izations later, eg. to store the hashes on disk, but I'm limited to only providing a handful of them because I have to explicitly list them in the binding
Oh that makes a lot of sense then. I'll merge the PR.
You might also be interested in PTHash-PHOBIC: https://github.com/jermp/pthash/tree/phobic. PHOBIC adds some optimizations to PTHash regarding space usage and construction under space-efficient configurations. The pre-print is forthcoming.
Thanks!
This mirrors internal_memory_builder_single_phf::build_from_hashes