sandsq / alc-rs

MIT License
0 stars 0 forks source link

Within-dataset weighing #19

Open sandsq opened 5 months ago

sandsq commented 5 months ago
sandsq commented 5 months ago

currently, we divide the counts of a given ngram by the total at each step of the computation instead of precomputing the frequencies at the start, which would only need to be done once. not gonna worry about that for now

sandsq commented 5 months ago

Actually scaling might not be so straightforward because longer ngrams means longer sequences means higher effort scores, even if scale by frequency. Is this desireable?