varnish / hitch

A scalable TLS proxy by Varnish Software.
https://www.varnish-software.com/
Other
1.88k stars 156 forks source link

reload hitch cause memory big increase: read all certificates into memory and then never released it? #374

Open iammeken opened 2 years ago

iammeken commented 2 years ago

Hello,

Ubuntu 22.04/20.04 hitch 1.73/1.71 12 workers 6500+ LETS ssl certs

service hitch reload: First reload will always double hitch memory used: from 3.1 to 6.5G next reload will increase ~ 400M each

I have tried several adjustments in hitch: with or without session, with or without ocsp.

frontend = "[*]:443"
backend  = "[127.0.0.1]:6086"
pem-dir = "/root/.acme.sh/hitch/"
pem-dir-glob = "*.pem"
syslog-facility = "daemon"
daemon = on
user = "_hitch"
group = "_hitch"
ssl-engine = ""
ciphers  = "EECDH+AESGCM:EDH+AESGCM:AES256+EECDH:AES256+EDH"
prefer-server-ciphers = on 
session-cache=300
sni-nomatch-abort = on
tls-protos = TLSv1.2 TLSv1.3
tcp-fastopen = on
alpn-protos = "h2,http/1.1"
write-proxy-v2 = on 
ocsp-connect-tmo = 4
ocsp-resp-tmo = 4
ocsp-dir = "/var/lib/hitch/" 
ocsp-verify-staple = off
workers = 12 # number of CPU cores
syslog = off
log-level = 1
keepalive = 300
backlog = 1024

[Unit]
Description=Hitch TLS unwrapping daemon
After=network.target
Documentation=https://github.com/varnish/hitch/tree/master/docs man:hitch(8)
ConditionPathExists=/etc/hitch/hitch.conf

[Service]
PrivateDevices=true
PrivateTmp=true
#ProtectHome=read-only
ProtectHome=full
ProtectSystem=full
Type=simple
Restart=on-failure
ExecStart=/usr/local/sbin/hitch --daemon --pidfile=/run/hitch.pid --user _hitch --group _hitch --config=/etc/hitch/hitch.conf
PIDFile=/run/hitch.pid
ExecStop=/usr/local/sbin/hitch stop
ExecReload=/bin/kill -HUP `$MAINPID`

`# Maximum number of open files (for ulimit -n)
LimitNOFILE=655360
# Locked shared memory - should suffice to lock the shared memory log
# (varnishd -l argument)
# Default log size is 80MB vsl + 1M vsm + header -> 82MB
# unit is bytes
LimitMEMLOCK=100000000
# Enable this to avoid "fork failed" on reload.
TasksMax=infinity
# Maximum size of the corefile.
LimitCORE=infinity

[Install]
WantedBy=multi-user.target
iammeken commented 1 year ago

It seems reload hitch will somehow read all certificates into memory and then never released it.

~ 6k bytes per certificate * 6400 (certificate) = 380M

So every reload will add 380M to memory.

Will you check it?

daghf commented 1 year ago

Hi.

Do you have plots of memory consumption over time?

On a reload, Hitch will launch a new set of worker processes while draining the old generation. So it is expected that there will be a period with significantly increased memory consumption - this should however go back down after the previous gen worker processes are drained of traffic and retire.

Could I ask you to monitor the number of total hitch processes running when you see this, and also see if the usage drops after they are cleaned up?

iammeken commented 1 year ago

With htop, I can only see only one hitch process, nearly 99.9% during first few seconds, then it drops.

The memory also increase a lot in first a few seconds, then stable at 0.3G increase.

And first reload will always doubles the memory, then next reload will increase 0.3-0.4G increase.

I have seen it in many ubuntu.

iammeken commented 1 year ago

Perhaps it always exists, you can see it when you have thousands of certs.

I bear it for years. :)

iammeken commented 1 year ago

Same machine, with wildcard certs (acme.sh), the first reload will double memory. with normal ssl (certbot, www and @), the first reload only increase a little.

Interesting ...

iammeken commented 1 year ago

And certbot ssl certs are bigger than wildcard ssl in file size.

iammeken commented 5 months ago

It is caused by Automatic OCSP staple retrieval.

I have to switch off Automatic OCSP staple retrieval by:

ocsp-dir = ""

iammeken commented 1 month ago

It is caused by Automatic OCSP staple retrieval.

I have to switch off Automatic OCSP staple retrieval by:

ocsp-dir = ""

Will you update hitch version to fix this bug?