kermitt2 / grobid

A machine learning software for extracting information from scholarly documents
https://grobid.readthedocs.io
Apache License 2.0
3.4k stars 443 forks source link

How to set a suitable config for concurrency? #1114

Closed thejiangcj closed 3 weeks ago

thejiangcj commented 3 months ago

Hello, I am working with grobid, and i tried multi parameters of concurrency and grobid python client's parameter "n".

I found that it is not a accuracy value to find the most suitable paratemers. And the reference manual have only the two client-grobid server and send client. Set the parameters 'concurrency' and 'n' according to the CPU limitations of the respective machine.

So the environment:

grobid server and python-client in a server, what should I set up the parameters 'concurrency' and 'n' ?

I tried multi parameters and didn't find obvious differences.

lfoppiano commented 3 months ago

Hi @thejiangcj did you see this documentation page? https://grobid.readthedocs.io/en/latest/Frequently-asked-questions/#could-we-have-some-guidance-for-server-configuration-in-production

AFAIK after you establish the concurrency parameter on the server, the client should follow using the same value.

thejiangcj commented 3 months ago

Hi @thejiangcj did you see this documentation page? https://grobid.readthedocs.io/en/latest/Frequently-asked-questions/#could-we-have-some-guidance-for-server-configuration-in-production

AFAIK after you establish the concurrency parameter on the server, the client should follow using the same value.

Yes, I see. However, it based on the server and the client are different machines?(not sure) I mean if the server and the client are the same one machine, whether concurrency parameter should equal to the machine's threads?

For example, if the machine has 16 threads, then there will be tow choices:

  1. grobid yaml's concurrency set to 16, and the python's n is 16, which https://grobid.readthedocs.io/en/latest/Frequently-asked-questions/#could-we-have-some-guidance-for-server-configuration-in-production describe.
  2. grobid yaml's concurrency set to 8, and the python's n is 8. In one machine, every task needs one thread, so divided by 2.
lfoppiano commented 3 months ago

@thejiangcj AFAIK You don't need to divide by 2 when the client runs on the same machine as the server. The client's charge on the CPU is lower than the server.

thejiangcj commented 3 weeks ago

@thejiangcj AFAIK You don't need to divide by 2 when the client runs on the same machine as the server. The client's charge on the CPU is lower than the server.

Thank you for your answer very much.