haproxy / haproxy

HAProxy Load Balancer's development branch (mirror of git.haproxy.org)
https://git.haproxy.org/
Other
4.94k stars 795 forks source link

Thread X is about to kill the process #1990

Closed Suskizauras closed 1 year ago

Suskizauras commented 1 year ago

Detailed Description of the Problem

Hello,

We have an issue where a certain user is crashing our haproxy service, server is not under high load or anything - just so it seems that only certain requests manage to make one thread kill the process.

We're using Lua for user & config management

I've went through multiple haproxy versions (2.0, 2.2, 2.4) - nothing seems to help

Perhaps you have any tips on how to troubleshoot/provide more information for this behaviour?

Expected Behavior

No threads killing the process

Steps to Reproduce the Behavior

We have yet to understand

Do you have any idea what may have caused this?

simple requests being sent, such as:

Jan 6 11:00:52 haproxy-server [32415]: ip_address:1325 [06/Jan/2023:11:00:52.268] front back/s27127 0/0/387/236/626 200 92247 - - --NN 10119/10119/9185/0/0 0/0 user "GET https://api.website.tools/tokens/v1?tokens=token:3821&tokens=token_id:3829&tokens=token_id:3801&tokens=token_id:3863&tokens=token_id:3932&tokens=token_id:3934&tokens=token_id:3942&tokens=token_id:3968&tokens=token_id:3970&tokens=token_id:3976&tokens=token_id:3977&tokens=token_id:3978&tokens=token_id:3984&tokens=token_id:3985&tokens=token_id:3999&tokens=token_id:4024&tokens=token_id:4029&tokens=token_id"

as well as simple CONNECT requests: Jan 6 12:56:02 haproxy-server [5964]: ip_address:40446 [06/Jan/2023:12:56:01.105] front back/s25754 0/0/3/5/1198 200 7042 - - --NN 3697/3696/2723/0/0 0/0 user "CONNECT api.website.tools:443 HTTP/1.1"

Do you have an idea how to solve the issue?

No response

What is your configuration?

global
    log /dev/log len 65500   local0
    log /dev/log len 65500   local1 notice
    chroot /var/lib/haproxy
    stats socket "${STATS_SOCKET}" mode 660 level admin expose-fd listeners
    stats timeout 30s
    user haproxy
    group haproxy
    daemon

    # Default ciphers to use on SSL-enabled listening sockets.
    # For more information, see ciphers(1SSL). This list is from:
    # https://hynek.me/articles/hardening-your-web-servers-ssl-ciphers/
    ssl-default-bind-ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS
    ssl-default-bind-options no-sslv3

    spread-checks 10
    maxconn 201000
    nbproc 1
    nbthread 8
    cpu-map auto:1/all 0-
    server-state-file "${STATE_FILE}"
    lua-load "${LUA_FILE}"

defaults
    log    global
    mode    http
    option    httplog
    option    dontlognull
    timeout connect 5s
    timeout client  50s
    timeout server  50s
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

    log-format "%ci:%cp [%tr] %ft %b/%s %TR/%Tw/%Tc/%Tr/%Ta %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %[var(txn.username)] %{+Q}r"
    load-server-state-from-file global
    timeout check 5s
    default-server source "${LISTEN_IP}" disabled inter 5m downinter 1m fastinter 10s fall 3 rise 2 maxconn 1000 check

frontend "${FRONTEND_NAME}"
    bind "${LISTEN_IP}":"${START_PORT}"-"${END_PORT}"
    maxconn 200000
    option http-use-proxy-header
    stick-table type string len 64 size 10k store bytes_out_cnt,bytes_in_cnt,http_req_cnt
    http-request set-var(txn.user) req.hdr(Proxy-Authorization),field(2,' '),b64dec if { req.hdr(Proxy-Authorization) -m found }
    http-request del-header Proxy-Authorization if { req.hdr(Proxy-Authorization) -m found }
    http-request set-var(txn.username) str('_no_proxy_auth') unless { var(txn.user) -m found }
    http-request lua.auth_and_set_vars if { var(txn.user) -m found }
    http-request set-var(txn.username) str('_failed_auth') if { var(txn.user) -m found } !{ var(txn.auth_ok) -m bool }
    http-request track-sc0 var(txn.username)
    http-request allow if { var(txn.auth_ok) -m bool }
    default_backend "${BACKEND_NAME}"

backend "${BACKEND_NAME}"
    balance random
    # Force persistence on sticky sessions only
    force-persist if { var(txn.cookie) -m found }
    dynamic-cookie-key "${SECRET}"
    http-request add-header Cookie SRVHASH=%[var(txn.cookie)] if { var(txn.cookie) -m found }
    cookie SRVHASH insert indirect postonly dynamic
    server-template s "${SERVERS}" localhost:1024

listen stats
    bind localhost:9001
    mode http
    stats enable
    stats hide-version
    stats realm Haproxy\ Statistics
    stats uri /

listen ping
    bind :80
    mode http
    monitor-uri /

Output of haproxy -vv

Linux haproxy-hostname 5.10.0-0.bpo.8-amd64 #1 SMP Debian 5.10.46-4~bpo10+1 (2021-08-07) x86_64 GNU/Linux

HA-Proxy version 2.0.23-3~bpo10+1 2021/08/14 - https://haproxy.org/
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -O2 -fdebug-prefix-map=/build/haproxy-2.0.23=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wno-implicit-fallthrough -Wno-stringop-overflow -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_SYSTEMD=1

Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 -CLOSEFROM -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=8).
Built with OpenSSL version : OpenSSL 1.1.1d  10 Sep 2019
Running on OpenSSL version : OpenSSL 1.1.1n  15 Mar 2022
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with network namespace support.
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE2 version : 10.32 2018-09-10
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with the Prometheus exporter as a service

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTX        side=FE|BE     mux=H2
              h2 : mode=HTTP       side=FE        mux=H2
       <default> : mode=HTX        side=FE|BE     mux=H1
       <default> : mode=TCP|HTTP   side=FE|BE     mux=PASS

Available services :
    prometheus-exporter

Available filters :
    [SPOE] spoe
    [COMP] compression
    [CACHE] cache
    [TRACE] trace

Last Outputs and Backtraces

....
Jan 04 06:01:48 haproxy-server haproxy[9228]: Thread 2 is about to kill the process.
Jan 04 06:01:48 haproxy-server haproxy[9228]:   Thread 1 : id=0x7ffaca78b1c0 act=0 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
Jan 04 06:01:48 haproxy-server haproxy[9228]:              stuck=0 fdcache=0 prof=0 harmless=1 wantrdv=0
Jan 04 06:01:48 haproxy-server haproxy[9228]:              cpu_ns: poll=53437457766 now=53437464588 diff=6822
Jan 04 06:01:48 haproxy-server haproxy[9228]:              curr_task=0
Jan 04 06:01:48 haproxy-server haproxy[9228]: *>Thread 2 : id=0x7ffaca767700 act=1 glob=1 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
Jan 04 06:01:48 haproxy-server haproxy[9228]:              stuck=1 fdcache=1 prof=0 harmless=0 wantrdv=0
Jan 04 06:01:48 haproxy-server haproxy[9228]:              cpu_ns: poll=51001683846 now=53221151584 diff=2219467738
Jan 04 06:01:48 haproxy-server haproxy[9228]:              curr_task=0x55583491b260 (task) calls=2 last=0
Jan 04 06:01:48 haproxy-server haproxy[9228]:                fct=0x5557df1bc190=main+413152 (process_stream) ctx=0x555834f26f00
Jan 04 06:01:48 haproxy-server haproxy[9228]:              strm=0x555834f26f00 src=user_ip fe=front be=front dst=unknown
Jan 04 06:01:48 haproxy-server haproxy[9228]:              rqf=40808002 rqa=30 rpf=80000000 rpa=0 sif=EST,200028 sib=INI,30
Jan 04 06:01:48 haproxy-server haproxy[9228]:              af=(nil),0 csf=0x7ffab8833b50,4000
Jan 04 06:01:48 haproxy-server haproxy[9228]:              ab=(nil),0 csb=(nil),0
Jan 04 06:01:48 haproxy-server haproxy[9228]:              cof=0x7ffab468df90,80203300:H1(0x7ffab83276b0)/RAW((nil))/tcpv4(41406)
Jan 04 06:01:48 haproxy-server haproxy[9228]:              cob=(nil),0:NONE((nil))/NONE((nil))/NONE(0)
Jan 04 06:01:48 haproxy-server haproxy[9228]:              Current executing Lua from a stream analyser -- stack traceback:
Jan 04 06:01:48 haproxy-server haproxy[9228]:   Thread 3 : id=0x7ffac2b5e700 act=0 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
Jan 04 06:01:48 haproxy-server haproxy[9228]:              stuck=0 fdcache=0 prof=0 harmless=1 wantrdv=0
Jan 04 06:01:48 haproxy-server haproxy[9228]:              cpu_ns: poll=49644193450 now=49644218527 diff=25077
Jan 04 06:01:48 haproxy-server haproxy[9228]:              curr_task=0
Jan 04 06:01:48 haproxy-server haproxy[9228]:   Thread 4 : id=0x7ffac218e700 act=0 glob=0 wq=1 rq=0 tl=0 tlsz=0 rqsz=0
Jan 04 06:01:48 haproxy-server haproxy[9228]:              stuck=0 fdcache=0 prof=0 harmless=1 wantrdv=0
...

Additional Information

Not every request breaks haproxy service - just a few at random intervals.

capflam commented 1 year ago

First, the 2.0.23 is outdated. You should upgrade to 2.0.30. Then your process seems to be stuck in your lua script. You must review it to be sure there is no blocking call.

capflam commented 1 year ago

I'm closing now, feel free to reopen with more info if the issue is still there.