jeanbern / Augury

A collection of natural language processing tools in C#, Augury contains everything you need to add a text predictor/spell-checker to your project.
https://jeanbern.github.io/Augury/
1 stars 0 forks source link

Investigate Double-Array Dawg #17

Open jeanbern opened 8 years ago

jeanbern commented 8 years ago

Expected outcome is that it will take a bit less space but run slower.

Could just implement both and leave them available for injection based on user preference.

Trade-offs:

jeanbern commented 7 years ago

Just as a refresher for what this means: DADAWG has two arrays. One representing the letter you came from (for verification) and another representing states. To go from state x and traversing through a given letter. You would look at states[x + letter] for the new state address. Then you double check that verification[x+letter] == letter. This is faster the the case where you check if a word exists, but a bit slower when doing near-neighbor detection. It requires a packing methods to efficiently fit states next to each other with no overlap of state + letter. But greedy is good enough apparently [http://citeseerx.ist.psu.edu/viewdoc/citations;jsessionid=F9F10F8B3A18B7B4D8370A12C23B80F9?doi=10.1.1.56.5272]