Open amorozkin opened 1 year ago
I tested it on my Raspberry Pi but did not encounter such a huge performance difference. What TLS ciphers were used in the graph above?
I tested it on my Raspberry Pi but did not encounter such a huge performance difference. What TLS ciphers were used in the graph above?
In both cases the same haproxy config was used with TLS options:
ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11
ssl-default-bind-ciphers TLS13-AES-256-GCM-SHA384:TLS13-AES-128-GCM-SHA256:TLS13-CHACHA20-POLY1305-SHA256:EECDH+AESGCM:EECDH+CHACHA20
tune.ssl.default-dh-param 2048
Both Haproxy itslef and Upstream (single one in test above) use 4096 bit length TLS certificates (annotaion "haproxy.org/server-ssl: "true" is configured in ingress)
K8s nodes: KVM VMs (Ubuntu 20.04.4 LTS, 5.4.0-109-generic, k8s version v1.23.4)
PODs:
resources:
limits:
cpu: "12"
memory: 24Gi
requests:
cpu: "10"
memory: 24Gi
....
securityContext:
sysctls:
- name: net.ipv4.ip_local_port_range
value: 1024 65535
- name: net.ipv4.tcp_rmem
value: 8192 87380 33554432
- name: net.ipv4.tcp_wmem
value: 8192 65536 33554432
- name: net.ipv4.tcp_max_syn_backlog
value: "20000"
- name: net.core.somaxconn
value: "20000"
- name: net.ipv4.tcp_tw_reuse
value: "1"
- name: net.ipv4.tcp_syncookies
value: "0"
- name: net.ipv4.tcp_slow_start_after_idle
value: "0"
- name: net.ipv4.tcp_fin_timeout
value: "30"
- name: net.ipv4.tcp_keepalive_time
value: "30"
- name: net.ipv4.tcp_keepalive_intvl
value: "10"
- name: net.ipv4.tcp_keepalive_probes
value: "3"
- name: net.ipv4.tcp_no_metrics_save
value: "1"
Haproxy: nbthread: "8"
IMHO TLS handshakes should not play a great deal with keepalive connections used on both ends: client<->haproxy AND haproxy<->upstream
@amorozkin I am reasonably sure this is not related to Alpine MUSL at all, but related to OpenSSL 3.0/3.1 mutex contention issues. I suspect your Glibc-based distribution is using OpenSSL 1.1.1, isn't it?
Could you please consider adding an option to use non-alpine based haproxy ingress images?
Alpine's PTHREAD implementaion has a drasitc CPU overhead - (internals/details can be found here https://stackoverflow.com/questions/73807754/how-one-pthread-waits-for-another-to-finish-via-futex-in-linux/73813907#73813907 )
Here are two strace statistics samples for the same load profile (25K RPS via 3 haproxy ingress pods) for the equal period of time (about 1 minute): 1. GLIBC based haproxy
2. MUSL based haproxy:
As you can see - the last one (MUSL based one) - 60+% of time spends on futex (FUTEX_WAKE_PRIVATE to be exact) system calls. As a reuslt - more than twice higher CPU utilisation on the same load profile acommpaned by upstream's sessions number spikes: