Open sdspieg opened 2 weeks ago
Hello @sdspieg
Sending 503 error is a normal behavior of GROBID and it means that the pool of threads is entirely used (max parallel requests). As documented, it means that the client has to wait a bit before sending new requests, the time that a new thread is available.
For a reference implementation on how to use the service in parallel, please look at the Grobid clients, in particular the Python client, other languages are available.
The documentation also indicates how to modify the size of the pool of threads, to adapt it to the server running the service.
Operating System and architecture (arm64, amd64, x86, etc.)
x64/wsl
What is your Java version
OpenJDK Runtime Environment (build 11.0.24+8-post-Ubuntu-1ubuntu324.04.1)
I’m encountering an issue where my script for batch-processing PDFs using GROBID’s processFulltextDocument API frequently generates output files containing the following error message:
Script Setup: I'm running a script that uses requests with ThreadPoolExecutor to submit multiple PDFs to the API in parallel. Here is a sample of my code:
Frequency of 503 Errors: This "503 Service Unavailable" error occurs frequently, causing the script to create many output files that only contain the HTML error response instead of the JSON output. Error in GROBID Logs: The GROBID logs show recurring entries indicating high load, but the server’s capacity or rate limits are unclear. Questions:
Rate Limits: Are there any known rate limits or maximum concurrent request limits for the GROBID processFulltextDocument endpoint?
Server Tuning: Are there specific server or configuration adjustments (e.g., thread limits, queue sizes) recommended for handling large-scale batch requests? Best Practices for Batch Processing: Any tips on structuring requests (e.g., delays or reduced concurrency) to minimize the risk of overloading GROBID? Any guidance on configuring GROBID or adjusting my script to avoid this 503 error would be greatly appreciated. Thank you!