Closed zacchiro closed 10 months ago
With the following oneliner change I could indeed start a PTHash search with larger values of c
for this dataset:
diff --git a/include/builders/util.hpp b/include/builders/util.hpp
index 7045492..c994253 100644
--- a/include/builders/util.hpp
+++ b/include/builders/util.hpp
@@ -8,7 +8,7 @@
namespace pthash {
-typedef uint32_t bucket_id_type;
+typedef uint64_t bucket_id_type;
typedef uint8_t bucket_size_type;
constexpr bucket_size_type MAX_BUCKET_SIZE = 100;
Would it be possible to make this compile-time configurable, rather than relying on users to patch the code?
Hi @zacchiro, thank you for your interest in PTHash and your kind words. My pleasure to assist you.
Using a different bucket_id_type
is indeed the way to go: that's why I used a typedef
there :)
Your suggestion of making it compile-configurable makes perfect sense. I expect this to be used by experienced users.
You might also want to try a partitioned construction, by specifying the parameter -p [num_partitions]
.
I would do it as follows:
$ ./build --minimal -c 5 -a 0.94 -e dictionary_dictionary -n 34121566250 -i graph.nodes.csv -o graph.nodes.mph -t 48 --external -m 128 -p 3500 --verbose --check --lookup
Done, as of e3a6b3fde2a70cc86e8015d7b86026d5b619cb81. Thanks!
Closing this. Feel free to drop me a line (giulioermanno.pibiri@unive.it) or open a new issue if needed.
Best, -Giulio
great, thanks!
Hello, I'm trying to use pthash on ~34*10^9 strings (~50 bytes each) and I'm hitting the runtime error in title when using the main executable like this:
It happens for
c >= 5
, but not withc = 4
.Does one need to recompile with the suggested type change to use higher
c
on such an input set?Thanks a lot for pthash, it's amazing :-)