manuelbetin / Phd_ComplexCrisisDatabase

1 stars 0 forks source link

Check early documents (1960) for currency crises #16

Open umbertocollodel opened 4 years ago

umbertocollodel commented 4 years ago

Understand whether the higher number of events detected by RR in the 1960s is our mistake (lexicon not good enough) or theirs (no actual currency crisis but just adjustment).

umbertocollodel commented 4 years ago

Possible solutions: 1) either find pattern and directly put into regex 2) change function for reading files (better solution - try with package tesseract)

umbertocollodel commented 4 years ago

Tesseract based on language dictionary

umbertocollodel commented 4 years ago

Another possibility that is more computationally efficient is the correction of misspelled words with the hunspell package

manuelbetin commented 4 years ago

I like this second approach. we need to test it but seems to possibly be just one line of code.

umbertocollodel commented 4 years ago

It would spare a lot of time, unfortunately a lot of misspelled terms are from economics. for example it substitutes "depreclatlon" with "deprecation"

manuelbetin commented 4 years ago

This is corrected by now right?