Open umbertocollodel opened 4 years ago
Possible solutions: 1) either find pattern and directly put into regex 2) change function for reading files (better solution - try with package tesseract)
Tesseract based on language dictionary
Another possibility that is more computationally efficient is the correction of misspelled words with the hunspell package
I like this second approach. we need to test it but seems to possibly be just one line of code.
It would spare a lot of time, unfortunately a lot of misspelled terms are from economics. for example it substitutes "depreclatlon" with "deprecation"
This is corrected by now right?
Understand whether the higher number of events detected by RR in the 1960s is our mistake (lexicon not good enough) or theirs (no actual currency crisis but just adjustment).