I wasn't intentionally testing OCR, but here we are. I won't share and example but its missing spaces\newlines and puts numbers where they don't belong.
when I run it through ocrmypdf with the following command: ocrmypdf --clean --output-type pdf --redo-ocr then re-run through llama-parse I get a much better result
Questioning_development_review.PDF
I wasn't intentionally testing OCR, but here we are. I won't share and example but its missing spaces\newlines and puts numbers where they don't belong.
when I run it through ocrmypdf with the following command:
ocrmypdf --clean --output-type pdf --redo-ocr
then re-run through llama-parse I get a much better result