neuml / paperetl

📄 ⚙️ ETL processes for medical and scientific papers
Apache License 2.0
342 stars 27 forks source link

Added note on grobid concurrency configuration to README. #52

Closed elshimone closed 10 months ago

elshimone commented 10 months ago

Fixes #50

davidmezzetti commented 10 months ago

Sorry to nitpick here but could we make this a bit more concise? In my experience if we have a lot of these type of edge case messages over time, it gets hard to read.

Maybe something like this:

Note: Depending on the number of CPUs in your system, the GROBID engine pool may be exhausted when parsing PDFs, resulting in a 503 error. This can be fixed by increasing the concurrency and/or poolMaxWait setting in the GROBID configuration file.

If you're strapped for time, I can handle myself. You've already done a bunch to help and it's been greatly appreciated.

davidmezzetti commented 10 months ago

Never mind, I made the minor edit. Thank you for the additions to paperetl and paperai!