haproxy / haproxy

HAProxy Load Balancer's development branch (mirror of git.haproxy.org)
https://git.haproxy.org/
Other
4.94k stars 795 forks source link

HA-Proxy version 2.0.27 A bogus STREAM #1556

Closed xpiotr87x closed 2 years ago

xpiotr87x commented 2 years ago

Detailed Description of the Problem

After migration from Centos 7 + HA-Proxy version 2.0.25-6986403 OpenSSL 1.1.1l to Ubuntu 20.04 + HA-Proxy version 2.0.27-1ppa1~focal OpenSSL 1.1.1f During more traffic to our svn server proxy die with code 134

haproxy[24498]: [ALERT] 047/155501 (24498) : A bogus STREAM [0x7fe5c00a5420] is spinning at 100001 calls per second and refuses to die, aborting now! Please report this error to developers [strm=0x7fe5c00a5420,44e src=xx.xx.xx.xx fe=secure be=svn dst=svn1 txn=0x7fe5c0082130,18203000 txn.req=MSG_DONE,c txn.rsp=MSG_DATA,d rqf=68848002 rqa=48000 rpf=a0070000 rpa=24000000 sif=EST,40008 sib=EST,41118 af=(nil),0 csf=0x7fe5c0081d60,8600 ab=(nil),0 csb=0x7fe5c001d480,8600 cof=0x7fe5c00287f0,80201360:PASS(0x7fe5c0028a30)/SSL(0x7fe5c002cfc0)/tcpv4(31) cob=0x7fe5c00247f0,203300:PASS(0x7fe5c007f040)/SSL(0x7fe5c001d700)/tcpv4(35) filters={0x7fe5c00a4700="compression filter"}]
haproxy[24498]: A bogus STREAM [0x7fe5c00a5420] is spinning at 100001 calls per second and refuses to die, aborting now! Please report this error to developers [strm=0x7fe5c00a5420,44e src=xx.xx.xx.xx fe=secure be=svn dst=svn1 txn=0x7fe5c0082130,18203000 txn.req=MSG_DONE,c txn.rsp=MSG_DATA,d rqf=68848002 rqa=48000 rpf=a0070000 rpa=24000000 sif=EST,40008 sib=EST,41118 af=(nil),0 csf=0x7fe5c0081d60,8600 ab=(nil),0 csb=0x7fe5c001d480,8600 cof=0x7fe5c00287f0,80201360:PASS(0x7fe5c0028a30)/SSL(0x7fe5c002cfc0)/tcpv4(31) cob=0x7fe5c00247f0,203300:PASS(0x7fe5c007f040)/SSL(0x7fe5c001d700)/tcpv4(35) filters={0x7fe5c00a4700="compression filter"}]
haproxy[24538]: [ALERT] 047/155522 (24538) : A bogus STREAM [0x7f341802d360] is spinning at 120358 calls per second and refuses to die, aborting now! Please report this error to developers [strm=0x7f341802d360,44e src=xx.xx.xx.xx fe=secure be=svn dst=svn1 txn=0x7f3418028d40,18203000 txn.req=MSG_DONE,c txn.rsp=MSG_DATA,d rqf=68848002 rqa=48000 rpf=a0070000 rpa=24000000 sif=EST,40008 sib=EST,41118 af=(nil),0 csf=0x7f341804bcf0,8600 ab=(nil),0 csb=0x7f341804deb0,8600 cof=0x7f341802b060,80201360:PASS(0x7f34180b2860)/SSL(0x7f341804fc30)/tcpv4(42) cob=0x7f341802ebe0,203300:PASS(0x7f341801d680)/SSL(0x7f341801d700)/tcpv4(44) filters={0x7f341802ba10="compression filter"}]
haproxy[24538]: A bogus STREAM [0x7f341802d360] is spinning at 120358 calls per second and refuses to die, aborting now! Please report this error to developers [strm=0x7f341802d360,44e src=xx.xx.xx.xx fe=secure be=svn dst=svn1 txn=0x7f3418028d40,18203000 txn.req=MSG_DONE,c txn.rsp=MSG_DATA,d rqf=68848002 rqa=48000 rpf=a0070000 rpa=24000000 sif=EST,40008 sib=EST,41118 af=(nil),0 csf=0x7f341804bcf0,8600 ab=(nil),0 csb=0x7f341804deb0,8600 cof=0x7f341802b060,80201360:PASS(0x7f34180b2860)/SSL(0x7f341804fc30)/tcpv4(42) cob=0x7f341802ebe0,203300:PASS(0x7f341801d680)/SSL(0x7f341801d700)/tcpv4(44) filters={0x7f341802ba10="compression filter"}]
haproxy[24538]: A bogus STREAM [0x7f341802d360] is spinning at 120358 calls per second and refuses to die, aborting now! Please report this error to developers [strm=0x7f341802d360,44e src=xx.xx.xx.xx fe=secure be=svn dst=svn1 txn=0x7f3418028d40,18203000 txn.req=MSG_DONE,c txn.rsp=MSG_DATA,d rqf=68848002 rqa=48000 rpf=a0070000 rpa=24000000 sif=EST,40008 sib=EST,41118 af=(nil),0 csf=0x7f341804bcf0,8600 ab=(nil),0 csb=0x7f341804deb0,8600 cof=0x7f341802b060,80201360:PASS(0x7f34180b2860)/SSL(0x7f341804fc30)/tcpv4(42) cob=0x7f341802ebe0,203300:PASS(0x7f341801d680)/SSL(0x7f341801d700)/tcpv4(44) filters={0x7f341802ba10="compression filter"}]

Expected Behavior

I expect the process run without reboots

Steps to Reproduce the Behavior

large number of requests to svn server via proxy

Do you have any idea what may have caused this?

No response

Do you have an idea how to solve the issue?

No response

What is your configuration?

global
    log /dev/log len 8096 local0
    log /dev/log len 8096 local1 notice
    master-worker
    chroot /var/lib/haproxy
    stats socket /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s
    user haproxy
    group haproxy
    daemon
    spread-checks 5

    ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES12
8-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
    ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
    ssl-default-bind-options prefer-client-ciphers no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets

    tune.ssl.cachesize 1000000
    tune.ssl.default-dh-param 2048

defaults
    log global
    mode http
    option dontlognull
    option redispatch
    option abortonclose
    option http-server-close
    no option http-use-htx
    retries 3
    maxconn 50000
    timeout http-request 10s
    timeout queue 1m
    timeout connect 4s
    timeout client 20s
    timeout server 30s
    timeout http-keep-alive 4s
    timeout check 5s
    errorfile 400 /etc/haproxy/errors/400.http
    errorfile 403 /etc/haproxy/errors/403.http
    errorfile 408 /etc/haproxy/errors/408.http
    errorfile 500 /etc/haproxy/errors/500.http
    errorfile 502 /etc/haproxy/errors/502.http
    errorfile 503 /etc/haproxy/errors/503.http
    errorfile 504 /etc/haproxy/errors/504.http

frontend secure
    bind xx.xx.xx.xx:443 ssl crt /etc/haproxy/ssl/
    maxconn 50000
    http-request set-header X-Forwarded-Proto https
    http-response set-header Server XXX
    http-response set-header Strict-Transport-Security max-age=15768000
    http-response set-header X-XSS-Protection "1; mode=block"
    option http-buffer-request
    compression algo gzip
    compression type text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript
    capture request header Host len 48
    capture request header User-Agent len 200

backend svn
no option httpclose
no option accept-invalid-http-response
timeout server 60s
option forwardfor
server svn1 xx.xx.xx.xx:443 ssl verify none check

Output of haproxy -vv

HA-Proxy version 2.0.27-1ppa1~focal 2022/01/26 - https://haproxy.org/
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -O2 -fdebug-prefix-map=/build/haproxy-cWASBl/haproxy-2.0.27=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-old-style-declaration -Wno-ignored-qualifiers -Wno-clobbered -Wno-missing-field-initializers -Wno-implicit-fallthrough -Wno-stringop-overflow -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_SYSTEMD=1

Feature list : +EPOLL -KQUEUE -MY_EPOLL -MY_SPLICE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +REGPARM -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -VSYSCALL +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 -CLOSEFROM -MY_ACCEPT4 +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL +THREAD_DUMP -EVPORTS

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=2).
Built with OpenSSL version : OpenSSL 1.1.1f  31 Mar 2020
Running on OpenSSL version : OpenSSL 1.1.1f  31 Mar 2020
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with network namespace support.
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with PCRE2 version : 10.34 2019-11-21
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with the Prometheus exporter as a service

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTX        side=FE|BE     mux=H2
              h2 : mode=HTTP       side=FE        mux=H2
       <default> : mode=HTX        side=FE|BE     mux=H1
       <default> : mode=TCP|HTTP   side=FE|BE     mux=PASS

Available services :
        prometheus-exporter

Available filters :
        [SPOE] spoe
        [COMP] compression
        [CACHE] cache
        [TRACE] trace

Last Outputs and Backtraces

systemd[1]: haproxy.service: Main process exited, code=exited, status=134/n/a
systemd[1]: haproxy.service: Failed with result 'exit-code'.
systemd[1]: haproxy.service: Scheduled restart job, restart counter is at 184.
systemd[1]: Stopped HAProxy Load Balancer.
systemd[1]: Starting HAProxy Load Balancer...
systemd[1]: Started HAProxy Load Balancer.

Additional Information

No response

wtarreau commented 2 years ago

Intersting, this is the first time I'm seeing it on something that looks like valid traffic. We triggered it for the first time 3 weeks ago while stressing the master-CLI. I guess that sometimes the traffic from/to SVN is bursty and causes many short wakeups. There were a few recent fixes at the channel level that might possibly have increased the likelihood to trigger it.

The attached patch that is planned for backporting should fix it by relaxing the condition to only care about wakeups without any transfer. Could you please try to rebuild with it to confirm ? Note that the chunk that changes the comment in include/haproxy/stream-t.h will fail to apply on 2.0 but you can safely ignore it, it's just a comment. Do not hesitate to tell us if you're not at ease with applying patches, we'll guide you.

Thanks for your report!

TimWolla commented 2 years ago

The attached patch

@wtarreau There's no attachment on your comment.

Also I guess that rebuilding for the reporter is non-trivial, because they use Vincent's Ubuntu PPA.

wtarreau commented 2 years ago

Sorry, as usual... I noticed the reporter was using Vincent's PPA, but some know how to do that, others don't, that's why I still preferred to ask ;-)

0001-BUG-MINOR-stream-make-the-call_rate-only-count-the-n.patch.txt

xpiotr87x commented 2 years ago

Thanks for the quick reply. I prepared two servers using VIP to be able to switch between them (working Centos with 2.0.25 / problematic Ubuntu with 2.0.27) Unfortunately I don't know how to do the rebuild. In source from Vincent's PPA there is an additional directory "debian" with additional files/patches and I don't know how to include them in build. I need some tips how to properly install your patch

wtarreau commented 2 years ago

OK if you don't know how to rebuild, I fear that we guide you through a process where you will feel uncomfortable, and it may require multiple round trips to help you install the required dependencies, plus I'm not fan of installing dev tools on servers, so this will complicate the process a bit further as I'm assuming you don't even have a dedicated build machine.

If the issue happens at a bearable frequency, the best I can propose you is to wait for next release, ideally next week, maybe the one after. The patch was marked for "backport only if anyone reports this issue". The condition is already met, and as we planned to emit a new release soon, it will come with it and you'll get other fixes plus this one from the official repo.

Does this suit your expectations ?

xpiotr87x commented 2 years ago

Yes thank you, I will wait for the next release, sticking to a older stable, working version. If the newer version does not help, I will prepare a dedicated test environment to allow for better analysis and let you know

xpiotr87x commented 2 years ago

New version solved the problem. Thanks for support

wtarreau commented 2 years ago

Perfect, thank you for the feedback!