Closed david-becher closed 4 years ago
I've done some basic testing of this and the actual impact seems to have been to increase latency and connection rate. I'm looking at the apiserver's etcd_request_duration_seconds_bucket
metric, as well as the mysql_global_status_connections
from the database server. The change here is between git master, and master with just this kine change.
Oh, that’s unfortunate. Can you give me some indication regarding which parameters you configured the connection pooling with? Or did you leave it as is and just used the default values for the proposed changes that I made? Than I would take a look on how to improve the behavior. At least the timeouts I didn’t have anymore with the proposed changes (which however only seem to show on small resource restricted dbs).
I had just pulled in the updated kine and let it use the default ConnectionPoolConfig without patching it in to the server args, which meant that maxIdle was set to 0. I'll finish my hack job and test again with sane defaults. It might be worth preventing maxIdle from getting set to less than the default, just in case someone else tries to use kine with the defaults.
After retesting with maxIdle set to 2, performance is exactly the same (as expected). It might be worth putting a lower bound of 2 on that setting, as performance with maxIdle 0 is abysmal.
This PR should solve the constant error messages (as described here: https://github.com/rancher/k3s/issues/1459) when using a small DB instance, e.g. db.t2.micro PostgresQL on AWS, which only can handle up to ~80 connections. This especially happens when listing all pods/workloads of the System project. I think it will just occur at any page which lists a lot of k3s resources.
The error messages also surfaced in the k3s journal logs:
Solution
My PR makes connection pooling, which is currently not configured at all by kine, adjustable.
I will also mention this in the linked issue, so this can get picked up by k3s as well. Ideally,
k3s server
should expose command line flags for its server, which would look something like this in production:--datastore-max-idle-connections
2
--datastore-max-open-connections
0
(unlimited)--datastore-connection-max-lifetime
0s
Note: The default values are actually chosen to reflect the current behavior.
sql.DB.maxIdle
indatabase/sql
is defaulted to 2.sql.DB.maxLifetime
is never set explicitly indatabase/sql
, so gets set to 0 when the struct gets created. Same is true forsql.DB.maxOpen
.This way, we could use
I also imagine that using environment variables would be quite nice
I compiled it into the k3s executable and tested it with various settings (important was to limit the
--datastore-max-open-connections
setting, as with small DBs this is the bottleneck) and I could not reproduce any more of those mentioned errors in the rancher UI.Let me know, if the PR is good or if it needs adjustments :)
I think, in most cases, this won't even be an issue, if you have a DB which can handle lots of connections. Otherwise, most of the time reusing idle connections can really help keeping memory usage at the DB lower and also allows small-sized DBs to be used by k3s/kine.
Cheers! 🎉