[ ] Make edit distances a single shared structure, rather than owned individually by each word.
This might not actually reduce memory usage at all, because I still need to store the distance between every word a and b, bidirectionally. If the data structure is singular, each distance will still have n^2 keys (n = word count).
[ ] Ignore edit distances beyond a configurable threshold.
For the threshold, I can use the same formula as the configurable choice variance.
[ ] Begin calculating edit distances while parsing the source text.
Move word edit distances structure into storage as temp files? Maybe this is overkill, for a future enhancement. It may be better to move other structures into storage first, like the lists of sentences and words.
[ ] Make edit distances a single shared structure, rather than owned individually by each word. This might not actually reduce memory usage at all, because I still need to store the distance between every word
a
andb
, bidirectionally. If the data structure is singular, each distance will still haven^2
keys (n
= word count).[ ] Ignore edit distances beyond a configurable threshold. For the threshold, I can use the same formula as the configurable choice variance.
[ ] Begin calculating edit distances while parsing the source text.
Move word edit distances structure into storage as temp files? Maybe this is overkill, for a future enhancement. It may be better to move other structures into storage first, like the lists of sentences and words.