Closed niklasf closed 8 years ago
The parameter estimation of https://github.com/scalanlp/breeze/blob/master/math/src/main/scala/breeze/util/BloomFilter.scala might be broken, as it is overloaded to early. Investigate.
Edit: Or the hashing function is too bad, as it relies on ##
.
why not just ask kyoto if the id exists?
We don't have the data in Kyoto (yet). The closest thing would be checking if the most specific (last stored) position already exists, but many high quality games do not branch enough to be unique at our current MAX_PLIES. Would be a shame to specifically penalize games that follow theory very deeply.
ornicar: have you considered http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/hash/BloomFilter.html ornicar: see that too https://github.com/Baqend/Orestes-Bloomfilter
I like https://github.com/Baqend/Orestes-Bloomfilter in interface as well as implementation.
Compare:
Round 140000000: 2269445 false positives (0.016210321428571428) | breeze, best possible due too bad hashing
Round 140000000: 16839 false positives (1.2027857142857143E-4) | Orestes, tuned down to target probability 0.001
Great!
We could use something like a bloom filter on the game id to detect duplicates. Probably this can't happen anyway, though.