Closed weaversa closed 5 years ago
You're right. We have working tests, so my guess is that serializing such a small map triggers some bug in the serialization process: if you build the map in the code, it works, and if you try, with, like, 200 strings it works, too. The problem is with serializing a very small number of keys. I suspect some off-by-one.
Thanks for the bug report. The workaround for the time being is just using more keys :).
Sorry—it's bullshit. Much easier. The RecSplit constructor which takes a file pointer (and which the dump tool uses) was implemented erroneously using the C getline(), which leaves the delimiter (e.g., \n
) in the string. You are correctly reading the strings without the ending newline—hence the problem.
I'll fix this ASAP. In all our experiments we use dump128 and load128, so nobody every noticed this.
Fixed in 0dc722298fe1809f9693cfb9b541dd9c02c8d762.
Thank you
I am trying to get the mapping from keys to unique indices out of a RecSplit MPHF. I created a file with 4 strings and passed it to
recsplit_dump_8
, creating an MPHF. I modifiedrecsplit_load.cpp
(shown below) to display the mapping. However, the mapping is not a bijection. I also tried with a million keys and could not get an MPHF.I've only been working with the tool for a few hours today, but I can't see how I'm using the interface incorrectly. Any help would be appreciated.
Here is the output of my run:
Here is my modification to
recsplit_load.cpp
to print the mapping.