Closed douglasqian closed 1 month ago
Update: I figured this out. It was the configurations in the env file.
The key is to size your batch in accordance with the max_output_tokens
config set in the LLM call. By default the repo sets 800 which is too low for me so I tuned it up to 4000 along with these configs
BATCH_SIZE=3 # Optional: Default is 1
MAX_CONCURRENT_OCR_REQUESTS=5 # Optional: Default is 5
MAX_CONCURRENT_PDF_CONVERSION=4 # Optional: Default is 4
Probably helps to mention this in the README
Great observation, will fix the max_output_token limit!
Thank you for letting me know about it! https://github.com/yigitkonur/swift-ocr-llm-powered-pdf-to-markdown/commit/98566d91f68cf7de42eb8710e35253a374376f5b
Context
Setup
I followed the README instructions to set it up and was able to successfully call the API locally from another terminal window after booting up the server. Here are the logs, everything seems to work well:
Results
But the actual results from SwiftOCR were quite far off. For reference here is the actual PDF: FundOpp_DE-FOA-0003294_Amd_000003.pdf
Here was the result from PyMuPDF4LLM: pymupdf4llm.md
Here was the result from SwiftOCR: swift_ocr.md
It seems like the last snippet that was merged in was on page 163 out of 164 so it did cover the entire document page range correctly. But it seems like there are a significant number of pages in the middle that were dropped.
Am I missing something?