Number of CPUs does not take into account

asfaltboy commented 1 year ago

My company deploys Prisma on kubernetes (managed k8s, aka GKE, in Google cloud), and have observed an issue with the default conenction_limit being set by Prisma to be higher than expected, and also that this seems dependent on our underlying kubernetes node type.

We've noticed that the current method of using num_cpus::get_physical() when computing the default connection limit does not consider cgroups. I was wondering whether it's not better to use the num_cpus::get() method to detect the actual CPU cores available in the execution environment, it seems to take care of parsing available cgroup v1 and v2 .

As in the docs:

This will also check cgroups, frequently used in containers to constrain CPU usage.

We did notice that Prisma mention only "physical cores" in the docs, and I wonder if this is intentional.

For what it's worth, I imagine that using (already opened) connections from a DB client is probably more of an I/O heavy operation, and may not introduce significant CPU load on the physical host, at least in most common cases (perhaps some benchmark profiling could prove me wrong). That said, having the option to automatically default to a smaller pool size when a container constrained, can be more in-line with the expected behavior.

Refs:

janpio commented 1 year ago

have observed an issue with the default conenction_limit being set by Prisma to be higher than expected

Is the "higher than expected" limit causing any problems? Or is it just "higher than could have anticipated when understanding what kind of cores these are"?

asfaltboy commented 1 year ago

Good point, I should have expanded on the impact, let me try:

In our k8s setup is that we run a bunch of small (/micro-) services in "pods" with limited resources on a smaller number of nodes (VMs). So for example an 8 core (8 logical, 4 physical btw) node can at any moment run 32 pods, each is allowed to use a small portion of CPU and memory as needed for the workload.

We start off by defaulting services to a relatively limited CPU / Memory resources (say 1 core at most), and we'd expect them to default to e.g 3 connections (however, they default to 9 = 4*2+1). We also start off with modest DB resources, such that the DB configuration sets max_connections to a small number. This misalignment causes us to need to adjust the pool/db configuration for many of the default workloads, as the pool size is too large for the default connection instance.

We can of course adjust the pool config in our default service, but I guess that's the point: the automatic value is meant to represent a sane default, which in our case will not be based on the workload but rather on the hosting node in the containerized environment.

I hope that makes sense, and I've not yammered your ears off :)

janpio commented 1 year ago

Ok, so you are basically saying that using logical instead of physical CPUs would be a more convenient default that will probably be more correct in more cases?

asfaltboy commented 1 year ago

Thanks for summarizing it so well :) yes, that's exactly right

This is my understanding, at least. I'm by no means an expert, so I'll be happy to learn of other reasons this should not use logical cores.

janpio commented 1 year ago

That was most probably an oversight.

But to be honest, the number of cores - both logical or physical - does actually say very little about how many connections an application should be able to open. It is based on a very old misunderstanding of an even older rule about how many connections should be open on bare metal servers. But until we figure that out, it is probably better to at least use the logical cores.

Are you up to doing a PR?

aqrln commented 1 year ago

FWIW you don't need an external crate for this in modern Rust, std::thread::available_parallelism is part of standard library now.

prisma / prisma-engines

Number of CPUs does not take into account #4341