jermp / pthash

Fast and compact minimal perfect hash functions in C++.
MIT License
190 stars 25 forks source link

Seed not working #13

Closed khodor14 closed 11 months ago

khodor14 commented 11 months ago

Hello @jermp @ByteHamster

I tried pthash on artificial data, it worked. However when I moved to the hash values of k-mers, it is raising an error related to the seed (see below). c = 6 alpha = 0.94 num_keys = 21817 table_size = 23209 num_buckets = 9083 == map+sort took: 0.002 seconds == merged 0% pairsterminate called after throwing an instance of 'pthash::seed_runtime_error' what(): seed did not work I tried different values of seed, but they are not working.

What could be the problem?

roberto-trani commented 11 months ago

Hi @khodor14 , could it be that your input contains duplicate keys?

jermp commented 11 months ago

Yes, surely due to duplicate keys in the input. Beware that using 64-bit hashes with more than ~2B keys, will result in hashes to collide with relatively high probability.

Best, -Giulio

khodor14 commented 11 months ago

thanks alot!