oadam / longest-word

A website which finds the longest word that can be built using a set of letters
http://le-mot-le-plus-long.herokuapp.com/
1 stars 0 forks source link

Switch to more clever algorithm #2

Open oadam opened 10 years ago

oadam commented 10 years ago

Words should be seen as multisets of runes. Iterate a first time on dictionary to build the multi sets. At the same time keep track off how many words contain 3 "a" and more and how many words contain 1 "e" and more. We should notice that containing 4 z is very rare. Then build an array from longest word to smallest word that we will fullscan at each query. This should be an array of small structs to avoid cache misses. In the small strict put the word , a pointer to the multi set and an array of the N rarest features of this word. The objective is to discard it as fast as possible without iterating through the whole multiset.

oadam commented 10 years ago

This puts unnecessary constraints on the Blum filter hash functions. Instead : Scan the whole dictionary Figure out good hash function candidates to create a 32 bit bloom filter Hash the input and iterate over the int32[wordCount] slice while xor-ing. When xor is happy perform the full check. Return the first 100 words

oadam commented 10 years ago

Filter should return true for a word and all its super multisets. The only kind of function that behave like this are function of the kind "w is a super multiset of k". Thus the problem is to find 32 nice multisets to build our hash functions

oadam commented 3 years ago

A good way would be to add these multiset 1 by 1. New multisets are found by a simulated annealing that maximizes the entropy of the hashes of the whole dictionary once the new filter is added. The current candidate is mutated by either a deletion or an addition (add more frequent letters more often) and mutated candidate is conserved or discarded based on its score and the current temperature of the annealing