It seems that even though I specified the language parameter as Russian, LlamaParse still recognized it as English.
When I tried another Russian book with the same code, the documents[0].text is empty, no text was extracted from the PDF file:
Orok_ Язык ороков (ульта) (Petrova) 22.pdf
Does LlamaParse not yet support the OCR of this kind of scanned foreign language PDF documents? Or did I miss something?
This is an excerpt of my book (only one page, in Russian): Dolgan_ Язык норильских долган (Ubrjatova) 3.pdf
I put the PDF file in 'tst' folder and my code is:
Instead of Russian text I got something like this:
It seems that even though I specified the language parameter as Russian, LlamaParse still recognized it as English.
When I tried another Russian book with the same code, the
documents[0].text
is empty, no text was extracted from the PDF file: Orok_ Язык ороков (ульта) (Petrova) 22.pdfDoes LlamaParse not yet support the OCR of this kind of scanned foreign language PDF documents? Or did I miss something?