timpalpant / LittleBoxes

A crossword solver
GNU General Public License v3.0
1 stars 0 forks source link

Add NGram set of words for fuzzy lookup of clues in the ClueDB #22

Closed timpalpant closed 8 years ago

timpalpant commented 8 years ago

When loading the ClueDB, construct an N-gram index of the clues. Then similar (but not necessarily exact) clues can be found in the database (#4).

Currently this changes the API of the ClueDB to have two methods:

  1. search(clue, threshold): Returns similar clues in the database within threshold match.
  2. answers(clue): Returns answers for the given clue, which must exist in the database.

It also removes the previously unused map of answers of a given length, and changes the serialization to match. The test fixtures would need to be updated to match the new serialization format.

timpalpant commented 8 years ago

This still needs some work (it's slow), but merging for now as a preliminary implementation.