Leaking threads (and low performance) in docker image

kermitt2 / biblio-glutton

A high performance bibliographic information service: https://biblio-glutton.readthedocs.io

117 stars 15 forks source link

Leaking threads (and low performance) in docker image #91

Open karatekaneen opened 10 months ago

karatekaneen commented 10 months ago

Hi,

I'm running the latest (master) version in Docker. We've allocated a VM with 4 cores and 32gb RAM for Glutton.

When running a batch of lookups (started with about 200k) by their bibliographic strings we can only process about 70-80 strings per minute.

The elasticsearch cluster is almost running idle so when I looked closer at the container I saw that the threads kept increasing. It looks like one thread is added for every citation processed and none is ever removed.

htop

Edit:

Forgot to mention that this does not respect maxThreads in the config.yml. I've tried running it both with the default value of 2048 and higher as well as lower. The performance is still low and the threads keeps on ticking up

kermitt2 commented 10 months ago

Hello @karatekaneen ! This is a problem of the docker image - it requires an init to reap zombies (e.g. adding --init in the docker run command).

The project has a dockerfile but it is not updated and supported, nor documented (I never tested it actually!). I really recommend not using it for the moment and to use a normal build for avoiding this issues and get a good performance.

I will try to set-up a working docker build in the next version.

karatekaneen commented 9 months ago

@kermitt2 Thank you for your response. We've been running the Docker image in production for a couple of years without any big problems until now.

What changes are needed to the Dockerfile to get it up to date again? I'm happy to contribute but unfortunately I'm not a (good enough) Java dev to figure out what's out of sync. So with some guidance I'm probably able to figure it out

kermitt2 commented 9 months ago

What changes are needed to the Dockerfile to get it up to date again? I'm happy to contribute but unfortunately I'm not a (good enough) Java dev to figure out what's out of sync. So with some guidance I'm probably able to figure it out

Well, I have not worked on this docker image, but apparently there's no init included (like tiny), which is necessary to close properly the process. So either you can start the container by passing --init as argument to the docker run, or include tiny in the docker file (see here). Then the gazillions of processes will be terminated properly.

The good news: it's only about docker settings, no java dev. needed.

karatekaneen commented 9 months ago

@kermitt2 Added tini in #90 which is the version we are running in production at the moment. With my first tests (on local wifi, so not an exact measurement) I got about a 40% increase in performance when looking up 20k DOIs. It spent about 664µs per DOI vs about 1.1ms per DOI without.

It still has 144 threads running so I'm not sure I did everything right but it's way better than the >2500 that the first version had running after a month uptime. So I'm going to check in on it in a couple of days to see how it looks.

karatekaneen commented 8 months ago

Unfortunately, the fix did not help. Just had a look at the service and it had 2000+ threads running with max set to 128 in the config.

kermitt2 commented 8 months ago

The thread in the config is for managing the server parallel requests, it's likely that the remaining threads are zombies.

I would suggest again not to use the docker image at this point for biblio-glutton :)

If it helps, it was how we used tini in Grobid (before using the --init parameter):

https://github.com/kermitt2/grobid/blob/0.7.1/Dockerfile.crf#L72