Open sandsq opened 5 months ago
currently, we divide the counts of a given ngram by the total at each step of the computation instead of precomputing the frequencies at the start, which would only need to be done once. not gonna worry about that for now
Actually scaling might not be so straightforward because longer ngrams means longer sequences means higher effort scores, even if scale by frequency. Is this desireable?