Poor performance on scanned PDFs with improperly rotated content

Describe the bug I noticed that when dealing with a scanned PDF with improperly rotated content, LlamaParse consistently gets certain details like numbers wrong, sometimes swapping out certain digits for others, repeating digits, etc.

It is consistently reproducible with caching disabled and rotating the PDF has a noticeable performance improvement.

I managed to make a reproduction by generating 50 random numbers, putting them in a PDF, converting them to an image, and then comparing the output when the PDF is correctly oriented and when it is not. In the first scenario, all numbers in the output are correct. In the second scenario, it outputs the correct amount of numbers, and some of them are incorrect.

Files numbers_normal.pdf numbers_rotated.pdf The original numbers

Job ID If you have it, please provide the ID of the job you ran. You can find it here: https://cloud.llamaindex.ai/parse in the "History" tab.

Screenshots Feel free to also provide screenshots if relevant.

Client: Please remove untested options:

Frontend (cloud.llamaindex.ai)
Typescript Library

Options Multimodal with Claude 3.5 Sonnet

Additional context I did see #32 before opening this issue, but I thought that my case was different enough for this issue to not be considered a duplicate. I also have specific reproduction steps which I thought are worth sharing.

run-llama / llama_parse

Poor performance on scanned PDFs with improperly rotated content #327