This allow to initialize the Document class with a lang that will be passed to tesseract.
(Giving tesseract a language sometimes greatly improve text extraction quality).
On ubuntu this requires to install the package tesseract-ocr-$lang$ where $lang$ is the 3 letter code for the language. On other OS, lang data for tesseract can be found at https://github.com/tesseract-ocr/langdata
This allow to initialize the
Document
class with a lang that will be passed to tesseract. (Giving tesseract a language sometimes greatly improve text extraction quality).On ubuntu this requires to install the package
tesseract-ocr-$lang$
where $lang$ is the 3 letter code for the language. On other OS, lang data for tesseract can be found at https://github.com/tesseract-ocr/langdata