Suggestions for Fast Production Server

yashpatel21 commented 3 months ago

I have instantiated my own nlm-ingestor api service on a Dedicated 8GB Linode instance (for testing purposes) using the provided docker container.

I have some questions regarding building a fast production server for parsing PDFs. I have code based on the getting started example provided, which sends this file:

https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_march_2022.pdf

to the nlm-ingestor api to get parsed and retrieve the chunks. This process, just for the above file, takes ~30 seconds. This is indeed faster than some other options, but for my use-case I need to be able to bring that time down to ~10 seconds. Are there any provided guidelines or suggestions for improving the speed of the PDF parsing service?

pashpashpash commented 3 months ago

Why does a simple file processing task take 30+ seconds? I am experiencing the same thing.

pashpashpash commented 3 months ago

cc @ansukla please advise. In its current state, this is unusable for a production use-case. Chunking files should take 5 seconds max.

ansukla commented 3 months ago

It shouldn't take that long unless you are using OCR. OCR takes time. If you are not using OCR, try to get a better server with faster CPU and more memory. 30 pages should be done in about 5s.

pashpashpash commented 3 months ago

@ansukla what specs would you recommend? I'm currently running the container on a 2GiB 1CPU pod.

pashpashpash commented 3 months ago

Update: It still takes 30s even with 4 CPUs for this file: https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/10q/uber_10q_march_2022.pdf

nlmatics / nlm-ingestor

Suggestions for Fast Production Server #37