lemire / rollinghashcpp

Rolling Hash C++ Library
183 stars 32 forks source link

GeneralHash: the default polynomial isn't really irreducible #6

Closed vcunat closed 8 years ago

vcunat commented 8 years ago

Hello (again). I wondered where you took the irreducible polynomials from, and unable to see that I tried to check them which failed for the 19-degree one. The other one seems OK according to Wolfram. (EDITed the link not to drop the + characters.)

I verified by different means that the factorization indeed does hold, e.g. 158543 ^ (158543<<1) ^ (158543<<2) == 987373.

For speed testing there's most likely no impact, even though there are conditionals on some of the bits but those should still be uniform enough. And to be honest, most rolling-hash implementations use much worse functions, but still...

vcunat commented 8 years ago

Suggestion: this polynomial looks nice http://www.wolframalpha.com/input/?i=factor+x^0+x^1+x^2+x^5+x^6+x^7+x^9+x^11+x^14+x^19+x^20+x^21+x^22+x^24+x^27+x^30+x^32+mod+2

I think there's little advantage in using smaller than 32-bit state for this algorithm, even on 32-bit-only machines (which are rare enough nowadays). In decimal representation it's 1232620263 (if without the x^32 bit).

Apart from being irreducible, it's also primitive, which might have some advantage, but I haven't given much thought to that and I'm really not so good in number theory anyway. I just chose a random line from a public list https://users.ece.cmu.edu/~koopman/lfsr/index.html

lemire commented 8 years ago

@vcunat

Thank you. I don't know how we ended up with a reducible polynomial. It is slightly embarrassing. I fixed it with the last commit.

lemire commented 8 years ago

@vcunat

I have moved your other proposal to another issue : https://github.com/lemire/rollinghashcpp/issues/7

You are entirely correct, but I want to distinguish the issues.

Closing this particular one.

vcunat commented 8 years ago

Right, it is better to keep it separate.