Closed miguelgondu closed 2 years ago
We will re-run a correction, now saving which words are being corrected and how many times. With this, we may know which one of the other options:
We ran a correction and found that the amount of words in the current RAE dictionary that get corrected but would get removed if we implement a threshold over the dictionary is not large. We decided to bump the RAE dictionary's threshold from 5 to 100 words (i.e., we will maintain in the dictionary words appearing >100x). This still keeps teoria
, but helps reduce bad corrections anyway.
This is, maybe, because of some of the cleaning. Should we drop the tildes? Should we run a correction on the bag of words just before training?