QUIC / HTTP sessions TCP socket leak

Tristan971 commented 2 years ago

Detailed Description of the Problem

A look into TCP stack metrics of QUIC-serving nodes suggests something is preventing a subset of HTTP sessions from being released, causing overtime increase memleak-style of allocated TCP sockets.

Expected Behavior

Expecting allocated TCP sockets to roughly match (ie be on the same order of magnitude) in-use TCP sockets.

Steps to Reproduce the Behavior

Not entirely clear, but it does require serving QUIC requests at least.

Do you have any idea what may have caused this?

No response

Do you have an idea how to solve the issue?

No response

What is your configuration?

global
    daemon
    user haproxy
    group haproxy
    chroot  /var/lib/haproxy

    log /dev/log local0
    log /dev/log local1 notice

    maxconn 10000
    cluster-secret $REDACTED

    stats socket  /run/haproxy/admin.sock mode 660 level admin expose-fd listeners
    stats timeout 30s

    ssl-default-bind-ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384
    ssl-default-bind-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
    ssl-default-bind-options prefer-client-ciphers no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets
    ssl-dh-param-file /etc/haproxy/certs/dhparam

    ssl-default-server-ciphersuites TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384:TLS_CHACHA20_POLY1305_SHA256
    ssl-default-server-options no-sslv3 no-tlsv10 no-tlsv11 no-tlsv12 no-tls-tickets

defaults defaults-base
    maxconn 5000
    log global

    balance leastconn
    option abortonclose
    option log-separate-errors

    retries 2
    option redispatch
    retry-on conn-failure empty-response response-timeout 0rtt-rejected

    mode http
    log-format "%ci:%cp - %ID - %ft %b/%s - status=%ST time=%Tt size=%B - %hr - \"%r\""

    unique-id-format %[uuid()]
    unique-id-header X-Request-ID

    timeout connect         4s
    timeout check           5s
    timeout http-request    10s
    timeout http-keep-alive 1s
    timeout client          32s
    timeout client-fin      1s
    timeout server          32s
    timeout server-fin      1s
    timeout queue           5s
    timeout tunnel          1h

defaults defaults-fe-public from defaults-base
    # honestly nothing really interesting; storing some headers in vars, and flagging/tracking requests

# some other irrelevant frontends

frontend https_$redacted from defaults-fe-public
    bind       blabla1:443 tfo allow-0rtt ssl strict-sni crt /path/to/cert.pem alpn h2,http/1.1
    bind quic4@blabla1:443 tfo allow-0rtt ssl strict-sni crt /path/to/cert.pem alpn h3
    bind       blabla2:443 tfo allow-0rtt ssl strict-sni crt /path/to/cert.pem alpn h2,http/1.1
    bind quic4@blabla2:443 tfo allow-0rtt ssl strict-sni crt /path/to/cert.pem alpn h3

    http-after-response set-header Alt-Svc 'h3=":443"; ma=60' if { ssl_fc }

    # some more request tracking and ACLs for routing...

    # the specifically affected backend
    use_backend uploads_covers if domain_uploads path_covers

resolvers internal-dns
    nameserver internal-keepalive $REDACTED:53

backend uploads_covers from defaults-base
    # some more tracking, ACLs, ...

    balance leastconn
    server local_nginx 127.0.0.1:$REDACTED check
    server remote-1 remote-1.$redacted:80 check resolvers internal-dns init-addr none
    server remote-2 remote-2.$redacted:80 check resolvers internal-dns init-addr none

Output of `haproxy -vv`

HAProxy version 2.7-dev2-53bfac8+mangadex-50d2515 2022-07-22T18:00+00:00 - https://haproxy.org/
Status: development branch - not safe for use in production.
Known bugs: https://github.com/haproxy/haproxy/issues?q=is:issue+is:open
Running on: Linux 5.4.143-1-pve #1 SMP PVE 5.4.143-1 (Tue, 28 Sep 2021 09:10:37 +0200) x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = generic
  CC      = cc
  CFLAGS  = -O2 -g -Wall -Wextra -Wundef -Wdeclaration-after-statement -Wfatal-errors -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wno-string-plus-int -Wno-atomic-alignment -DMAX_SESS_STKCTR=5
  OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_STATIC_PCRE2=1 USE_LIBCRYPT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_TFO=1 USE_NS=1 USE_SYSTEMD=1 USE_QUIC=1 USE_PROMEX=1
  DEBUG   = -DDEBUG_MEMORY_POOLS -DDEBUG_STRICT

Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL +THREAD +BACKTRACE -STATIC_PCRE +STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H -ENGINE +GETADDRINFO +OPENSSL +LUA +ACCEPT4 -CLOSEFROM -ZLIB +SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT +QUIC +PROMEX -MEMORY_PROFILING

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=8).
Built with OpenSSL version : OpenSSL 3.0.5+quic-mangadex-50d2515 22 Jul 2022
Running on OpenSSL version : OpenSSL 3.0.5+quic-mangadex-50d2515 22 Jul 2022
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
OpenSSL providers loaded : default
Built with Lua version : Lua 5.3.6
Built with the Prometheus exporter as a service
Built with network namespace support.
Support for malloc_trim() is enabled.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.40 2022-04-14
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 8.3.0

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
       quic : mode=HTTP  side=FE     mux=QUIC  flags=HTX|NO_UPG|FRAMED
         h2 : mode=HTTP  side=FE|BE  mux=H2    flags=HTX|HOL_RISK|NO_UPG
       fcgi : mode=HTTP  side=BE     mux=FCGI  flags=HTX|HOL_RISK|NO_UPG
  <default> : mode=HTTP  side=FE|BE  mux=H1    flags=HTX
         h1 : mode=HTTP  side=FE|BE  mux=H1    flags=HTX|NO_UPG
  <default> : mode=TCP   side=FE|BE  mux=PASS  flags=
       none : mode=TCP   side=FE|BE  mux=PASS  flags=NO_UPG

Available services : prometheus-exporter
Available filters :
        [BWLIM] bwlim-in
        [BWLIM] bwlim-out
        [CACHE] cache
        [COMP] compression
        [FCGI] fcgi-app
        [SPOE] spoe
        [TRACE] trace

Last Outputs and Backtraces

No response

Additional Information

Noticed some oddities in metrics when serving QUIC traffic before (see https://github.com/haproxy/haproxy/issues/1759#issuecomment-1168786465), but this time since #1759 was fixed I let it run full-blast for a bit on part of our edge.

Then looking at metrics I noticed the very high amount of sessions and growing 2022-07-24 10_53_17

An interesting aspect is that both frontend and backend sessions keep growing. Also they grow per-backend nearly perfectly uniformly, despite there being 2 very different backends involved (one local nginx and two remote Apache Traffic Server instances): 2022-07-24 10_53_25

For comparison, this is what it looks like when QUIC isn't advertised (same edge node and backends, same traffic) 2022-07-24 10_55_23 2022-07-24 10_55_59

Just in case it was the metrics being wrong (I've been suspecting there is something up with them and QUIC for a bit), I checked node-exporter's metrics for those nodes and there is indeed a TCP socket leak going on over the same period (this screenshot is for 1/2 test edge nodes): 2022-07-24 10_33_42

And again, how it looks like when QUIC isn't advertised, all other things being equal: 2022-07-24 10_34_52

I tried running a show sess command on the admin socket, and didn't get a matching amount of sessions back, so the leaking ones seem unaccounted. I didn't note down specific numbers at the time however, and will try again and note the precise values.

Overall I don't have much more insight on how to start looking into this, here are global packet statistics on the node, I can probably provide a bunch of other system metrics if they would help.

2022-07-24 10_43_21 2022-07-24 10_45_38

chipitsine commented 2 years ago

where did you get OpenSSL 3.0.5+quic-mangadex-50d2515 ? just curious

Tristan971 commented 2 years ago

where did you get OpenSSL 3.0.5+quic-mangadex-50d2515 ? just curious

I build those here https://gitlab.com/mangadex-pub/haproxy/-/jobs/2759190154

Tristan971 commented 2 years ago

Something that may or may not be relevant to this is that we run on an anycasted network, and some users picking up QUIC advertisement from the test LBs (ie running 2.7-dev) were leaking to "stable" ones running 2.6.2 (and thus without the fix for #1759) with QUIC bound but not enabled. This means they would occasionally cause worker crashes on other LBs as per that issue. (in large part due to some consumer ISPs having routing policies one could only qualify as "rather exotic")

The rate of such occurences (leaking across edge locations + triggering the bug of that other issue) does looks way too small to explain this issue on its own however.

Tristan971 commented 2 years ago

Alright so now that I have the issue live, here's a few more pieces of internal haproxy state; hope it helps

show info

``` Name: HAProxy Version: 2.7-dev2-53bfac8+mangadex-50d2515 Release_date: 2022-07-22T18:00+00:00 Nbthread: 8 Nbproc: 1 Process_num: 1 Pid: 32100 Uptime: 0d 2h15m52s Uptime_sec: 8152 Memmax_MB: 0 PoolAlloc_MB: 352 PoolUsed_MB: 352 PoolFailed: 0 Ulimit-n: 20112 Maxsock: 20112 Maxconn: 10000 Hard_maxconn: 10000 CurrConns: 804 CumConns: 336199 CumReq: 1439504 MaxSslConns: 0 CurrSslConns: 77 CumSslConns: 209495 Maxpipes: 0 PipesUsed: 0 PipesFree: 0 ConnRate: 42 ConnRateLimit: 0 MaxConnRate: 220 SessRate: 42 SessRateLimit: 0 MaxSessRate: 220 SslRate: 42 SslRateLimit: 0 MaxSslRate: 124 SslFrontendKeyRate: 22 SslFrontendMaxKeyRate: 69 SslFrontendSessionReuse_pct: 46 SslBackendKeyRate: 0 SslBackendMaxKeyRate: 0 SslCacheLookups: 99644 SslCacheMisses: 32419 CompressBpsIn: 0 CompressBpsOut: 0 CompressBpsRateLim: 0 Tasks: 4655 Run_queue: 1 Idle_pct: 94 node: lon-eu-ext-lb-1 Stopping: 0 Jobs: 837 Unstoppable Jobs: 1 Listeners: 32 ActivePeers: 0 ConnectedPeers: 0 DroppedLogs: 24 BusyPolling: 0 FailedResolutions: 0 TotalBytesOut: 290458262999 TotalSplicdedBytesOut: 0 BytesOutRate: 32112544 DebugCommandsIssued: 0 CumRecvLogs: 0 Build info: 2.7-dev2-53bfac8+mangadex-50d2515 Memmax_bytes: 0 PoolAlloc_bytes: 369335648 PoolUsed_bytes: 369335648 Start_time_sec: 1658656668 Tainted: 0 ```

show activity

``` thread_id: 3 (1..8) date_now: 1658665115.948396 ctxsw: 155417710 [ 20167027 19026955 19515891 18954731 19343159 19968906 18692160 19748881 ] tasksw: 5284115 [ 708160 660887 636679 644181 659844 645099 643223 686042 ] empty_rq: 33849621 [ 7465866 1743723 1690113 3917455 5899480 2435583 2473963 8223438 ] long_rq: 50392 [ 7072 6412 6383 6196 5154 6180 6670 6325 ] loops: 66456070 [ 11678329 5730717 5766240 7890961 10034861 6558778 6468282 12327902 ] wake_tasks: 442792 [ 58264 54378 54371 55192 56318 54168 54438 55663 ] wake_signal: 0 [ 0 0 0 0 0 0 0 0 ] poll_io: 64692862 [ 11512242 5461704 5496364 7680430 9857246 6296994 6221069 12166813 ] poll_exp: 8377 [ 1171 1065 970 738 1025 835 1227 1346 ] poll_drop_fd: 170 [ 31 23 16 19 24 19 18 20 ] poll_skip_fd: 481843 [ 61193 61344 60189 59259 59277 60608 59698 60275 ] conn_dead: 0 [ 0 0 0 0 0 0 0 0 ] stream_calls: 4107968 [ 526046 531112 511813 498387 493381 520521 508405 518303 ] pool_fail: 0 [ 0 0 0 0 0 0 0 0 ] buf_wait: 0 [ 0 0 0 0 0 0 0 0 ] cpust_ms_tot: 5695 [ 789 592 659 759 758 638 690 810 ] cpust_ms_1s: 0 [ 0 0 0 0 0 0 0 0 ] cpust_ms_15s: 1 [ 0 0 0 0 0 1 0 0 ] avg_loop_us: 44 [ 40 93 27 25 20 35 70 43 ] accepted: 256084 [ 61196 8597 15984 27434 42993 6961 11372 81547 ] accq_pushed: 219266 [ 27188 27486 27563 27365 27806 27665 27239 26954 ] accq_full: 0 [ 0 0 0 0 0 0 0 0 ] accq_ring: 0 [ 0 0 0 0 0 0 0 0 ] fd_takeover: 488120 [ 62708 62150 61005 59877 59602 60677 60663 61438 ] ```

show sess/fd

sess count and fd count nearly matching 1:1 show sess excerpt: ``` 0x7f9a5ca27160: proto=quic4 src=A.B.C.D:58025 fe=https_redacted be=uploads_covers srv=redacted-2 ts=00 epoch=0 age=2h17m calls=4 rate=0 cpu=0 lat=0 rq[f=8c4a064h,i=0,an=00h,rx=,wx=,ax=] rp[f=80048000h,i=16384,an=00h,rx=,wx=,ax=] scf=[8,0h,fd=-1] scb=[8,211h,fd=302] exp= rc=0 c_exp= 0x7f9a5c8d5b40: proto=quic4 src=A.B.C.D:58025 fe=https_redacted be=uploads_covers srv=redacted-1 ts=00 epoch=0 age=2h17m calls=4 rate=0 cpu=0 lat=0 rq[f=8c4a064h,i=0,an=00h,rx=,wx=,ax=] rp[f=80048000h,i=16384,an=00h,rx=,wx=,ax=] scf=[8,0h,fd=-1] scb=[8,211h,fd=192] exp= rc=0 c_exp= 0x7f9a5cbb00e0: proto=quic4 src=A.B.C.D:58025 fe=https_redacted be=uploads_covers srv=local_nginx ts=00 epoch=0 age=2h17m calls=4 rate=0 cpu=0 lat=0 rq[f=8c4a064h,i=0,an=00h,rx=,wx=,ax=] rp[f=80048000h,i=16384,an=00h,rx=,wx=,ax=] scf=[8,0h,fd=-1] scb=[8,211h,fd=237] exp= rc=0 c_exp= ``` In this case, they were show fd excerpt (matching the above sessions): ``` 302 : st=0x010122(cL heopI W:sRa R:sRa) ref=0 gid=1 tmask=0x20 umask=0x0 owner=0x7f9a58450c60 iocb=0x56183e8f5bd0(sock_conn_iocb) back=1 cflg=0x00080300 fam=ipv4 lport=51412 sv=uploads_covers/redacted-2 mux=H1 ctx=0x7f9a58929310 h1c.flg=0x80003220 .sub=0 .ibuf=16320@0x7f9a5c9ec390+16145/16384 .obuf=0@(nil)+0/0 h1s=0x7f9a5c38c260 h1s.flg=0x4094 .sd.flg=0x9c0401 .req.state=MSG_DONE .res.state=MSG_DATA .meth=GET status=200 .sd.flg=0x009c0401 .sc.flg=0x00000211 .sc.app=0x7f9a5ca272de .subs=(nil) xprt=RAW 192 : st=0x010122(cL heopI W:sRa R:sRa) ref=0 gid=1 tmask=0x20 umask=0x0 owner=0x7f9a58a19fd0 iocb=0x56183e8f5bd0(sock_conn_iocb) back=1 cflg=0x00080300 fam=ipv4 lport=48752 sv=uploads_covers/redacted-1 mux=H1 ctx=0x7f9a5831f900 h1c.flg=0x80003220 .sub=0 .ibuf=16320@0x7f9a5c57d520+3113/16384 .obuf=0@(nil)+0/0 h1s=0x7f9a5c7aa490 h1s.flg=0x4094 .sd.flg=0x9c0401 .req.state=MSG_DONE .res.state=MSG_DATA .meth=GET status=200 .sd.flg=0x009c0401 .sc.flg=0x00000211 .sc.app=0x7f9a5c8d5cbe .subs=(nil) xprt=RAW 237 : st=0x010122(cL heopI W:sRa R:sRa) ref=0 gid=1 tmask=0x20 umask=0x0 owner=0x7f9a6c6d96a0 iocb=0x56183e8f5bd0(sock_conn_iocb) back=1 cflg=0x00080300 fam=ipv4 lport=37982 sv=uploads_covers/local_nginx mux=H1 ctx=0x7f9a6c8e9c50 h1c.flg=0x80003220 .sub=0 .ibuf=16320@0x7f9a5c437e30+15962/16384 .obuf=0@(nil)+0/0 h1s=0x7f9a5c69e670 h1s.flg=0x4094 .sd.flg=0x9c0401 .req.state=MSG_DONE .res.state=MSG_DATA .meth=GET status=200 .sd.flg=0x009c0401 .sc.flg=0x00000211 .sc.app=0x7f9a5cbb025e .subs=(nil) xprt=RAW ```

show pools

``` Dumping pools usage. Use SIGQUIT to flush them. - Pool quic_tls_iv (24 bytes) : 13687 allocated (328488 bytes), 13687 used (~83 by thread caches), needed_avg 13684, 0 failures, 10 users, @0x5618400c4440 [SHARED] - Pool quic_arng (56 bytes) : 35685 allocated (1998360 bytes), 35685 used (~212 by thread caches), needed_avg 35682, 0 failures, 35 users, @0x5618400bacc0 [SHARED] - Pool quic_rx_cry (88 bytes) : 9430 allocated (829840 bytes), 9430 used (~199 by thread caches), needed_avg 9426, 0 failures, 12 users, @0x5618400bdf40 [SHARED] - Pool h3c (120 bytes) : 922 allocated (110640 bytes), 922 used (~19 by thread caches), needed_avg 919, 0 failures, 3 users, @0x5618400c65c0 [SHARED] - Pool ssl_sock_ct (152 bytes) : 70010 allocated (10641520 bytes), 70010 used (~243 by thread caches), needed_avg 70006, 0 failures, 13 users, @0x5618400b8b40 [SHARED] - Pool quic_frame (184 bytes) : 33566 allocated (6176144 bytes), 33566 used (~217 by thread caches), needed_avg 33562, 0 failures, 12 users, @0x5618400bbdc0 [SHARED] - Pool quic_dgram (216 bytes) : 1222 allocated (263952 bytes), 1222 used (~403 by thread caches), needed_avg 1218, 0 failures, 5 users, @0x5618400c0100 [SHARED] - Pool session (248 bytes) : 928 allocated (230144 bytes), 928 used (~12 by thread caches), needed_avg 928, 0 failures, 1 users, @0x5618400ccac0 [SHARED] - Pool ssl_sock_cl (280 bytes) : 5546 allocated (1552880 bytes), 5546 used (~20 by thread caches), needed_avg 5548, 0 failures, 6 users, @0x5618400b7a80 [SHARED] - Pool spoe_ctx (312 bytes) : 0 allocated (0 bytes), 0 used (~0 by thread caches), needed_avg 0, 0 failures, 1 users, @0x5618400ca900 [SHARED] - Pool quic_rx_pac (408 bytes) : 4271 allocated (1742568 bytes), 4271 used (~47 by thread caches), needed_avg 4273, 0 failures, 1 users, @0x5618400bf000 [SHARED] - Pool qcc (472 bytes) : 851 allocated (401672 bytes), 851 used (~7 by thread caches), needed_avg 850, 0 failures, 2 users, @0x5618400c5500 [SHARED] - Pool resolv_answ (600 bytes) : 14 allocated (8400 bytes), 14 used (~5 by thread caches), needed_avg 13, 0 failures, 1 users, @0x5618400cb9c0 [SHARED] - Pool stream (1016 bytes) : 677 allocated (687832 bytes), 677 used (~20 by thread caches), needed_avg 677, 0 failures, 1 users, @0x5618400c9840 [SHARED] - Pool quic_crypto (1048 bytes) : 6266 allocated (6566768 bytes), 6266 used (~36 by thread caches), needed_avg 6267, 0 failures, 2 users, @0x5618400bce80 [SHARED] - Pool fcgi_conn (1240 bytes) : 0 allocated (0 bytes), 0 used (~0 by thread caches), needed_avg 0, 0 failures, 1 users, @0x5618400c8780 [SHARED] - Pool h2c (1304 bytes) : 65 allocated (84760 bytes), 65 used (~9 by thread caches), needed_avg 63, 0 failures, 1 users, @0x5618400c7680 [SHARED] - Pool quic_conn (3864 bytes) : 871 allocated (3365544 bytes), 871 used (~6 by thread caches), needed_avg 868, 0 failures, 1 users, @0x5618400c11c0 [SHARED] - Pool hpack_tbl (4096 bytes) : 65 allocated (266240 bytes), 65 used (~9 by thread caches), needed_avg 63, 0 failures, 1 users, @0x5618403a7dc0 [SHARED] - Pool quic_tx_rin (4120 bytes) : 56 allocated (230720 bytes), 56 used (~0 by thread caches), needed_avg 57, 0 failures, 2 users, @0x5618400c3340 [SHARED] - Pool buffer (16384 bytes) : 19475 allocated (319078400 bytes), 19475 used (~116 by thread caches), needed_avg 19471, 0 failures, 1 users, @0x56184039e540 [SHARED] - Pool trash (16416 bytes) : 8 allocated (131328 bytes), 8 used (~7 by thread caches), needed_avg 6, 0 failures, 1 users, @0x5618400cdb80 - Pool quic_conn_r (65560 bytes) : 871 allocated (57102760 bytes), 871 used (~6 by thread caches), needed_avg 868, 0 failures, 2 users, @0x5618400c2280 [SHARED] - Pool quic_rxbuf (262168 bytes) : 56 allocated (14681408 bytes), 56 used (~0 by thread caches), needed_avg 52, 0 failures, 1 users, @0x5618400b9c00 [SHARED] Total: 24 pools, 426480368 bytes allocated, 426480368 used (~2770272 by thread caches). ```

show servers conn uploads_covers (most affected backend)

``` # bkname/svname bkid/svid addr port - purge_delay used_cur used_max need_est unsafe_nb safe_nb idle_lim idle_cur idle_per_thr[8] uploads_covers/local_nginx 29/1 127.0.0.1 8081 - 5000 193 216 216 9 14 -1 23 0 0 0 0 2 1 20 0 uploads_covers/redacted-1 29/2 redacted 80 - 5000 195 197 214 4 15 -1 19 0 1 0 0 1 2 15 0 uploads_covers/redacted-2 29/3 redacted 80 - 5000 191 212 212 4 17 -1 21 0 0 0 0 3 3 15 0 ```

wtarreau commented 2 years ago

Hi Tristan,

while not sure, I suspect that the bug fixed by commit c1640f79f ("BUG/MEDIUM: fd/threads: fix incorrect thread selection in wakeup broadcast") could cause stalls on idle backend connections when takeover happens at high rates. That wouldn't explain why the stream's timeout wouldn't clear them up though.

In your case we're seeing that th 3 connections you picked had data to be read and were lacking storage room, so I'm not 100% convinced this can be related though.

a-denoyelle commented 2 years ago

I recently found a bug which prevents session to be released. This is caused when transferring H3 DATA frames from the client (typically POST requests), but it requires specific memory condition to be triggered. I will soon provide a fix, please tell me if this solves at least partially your observations.

Tristan971 commented 2 years ago

I'll have a try with 72a78e8290a1f34c08225f63e7233d211195387f though the backend affected is purely OPTIONS/GET traffic (it's the one associated with our image CDN, so not POSTs there) 😅

It's quite interesting that it'd be this specific backend (uploaded covers) and not the other CDN backend (uploaded chapter pages), as the only difference between them is that:

covers are smaller
covers also tend to load faster (both on the frontend and backend)

Then again, it could be so many things that it's hard to say... Feel free to let me know if some combination of flags or dumps could help.

Tristan971 commented 2 years ago

@a-denoyelle It does look much better with your patch.

There is still an increase in open sessions compared to H2, but that could be just due to #1808 currently forcing me to run with O0 so it could just be being slower causing that.

Once that one is fixed I should be able to get better long-term data

Tristan971 commented 2 years ago

From the latest developments in #1808 I've now gotten a QUIC-enabled instance to be up for a little bit longer. Unfortunately that showed that the issue is fixed (or mostly mitigated) on the frontend side, but not on the backend side (ie FE conns definitely don't grow incontrolably anymore, but BE conns still do).

Are there metrics/commands/etc that I could provide to help? (that said we could also just wait for 1808 to be fully dealt with)

Something that could be related is that the backends for which the connections accumulate are specifically those serving images, and they are afaik the only ones that currently use HTTP 206 in our stack (mainline nginx and prerelease ATS).

Tristan971 commented 2 years ago

Here's a little bit more data after the recent other QUIC-related fixes:

2022-08-19 17_12_41

We can see 2 interesting things here. First, the frontend still leaks, but very slowly; let's ignore that for now.

More importantly, the backend leaks but not at all at a consistent rate. If you look closely, instead it "jumps" every now and then rather than continuously grow. Zooming in shows it more clearly:

2022-08-19 17_19_21

I'm not familiar with HAProxy's backend session management but that looks a lot like what happens to a server thread pool that is never allowed to shrink. ie any temporary slowdown triggers new threads to be allocated, but they're never reclaimed.

Now that should eventually peak however (when no slowdown is big enough to trigger a scale-up), which doesn't look to be the case here, so presumably those sessions are also not reused later on either.

Now the unfortunate thing with this theory is that... there's no obvious correlation with backend response times over the same (zoomed-in) timerange...

2022-08-19 17_21_36

Either way, maybe/hopefully this gives some idea of where to start looking

haproxyFred commented 2 years ago

Note that I have pushed a bug fix about a memleak in the stream part of our QUIC implementation here which could explain the one described here on the frontend: https://github.com/haproxytech/quic-dev/commit/ea4a5cbbdfa71cd287d453dffbdf643846754bba

Tristan971 commented 2 years ago

Note that I have pushed a bug fix about a memleak in the stream part of our QUIC implementation here which could explain the one described here on the frontend: haproxytech/quic-dev@ea4a5cb

Nice, I'll give it a go

Tristan971 commented 2 years ago

2022-08-21 23_07_32

Unfortunately it doesn't seem like that was the main contributor this time around 😅

haproxyFred commented 2 years ago

Ok, thank you for your response. I have found another memleak on TX part. Here is a fix: https://github.com/haproxytech/quic-dev/commit/da9c441886dbfa02840a93e367f65fd6d312c835

wtarreau commented 2 years ago

Now merged into master, thanks Fred!

Tristan971 commented 2 years ago

I shall give it a spin 👍

Tristan971 commented 1 year ago

Just mentioning some improvement to this that I observed in the last deployment

Before is eba9088a7c8ea4e1b0de6a9db38c6a39d5d4b204 and after is e35463c767d693d24fc26b6575749e8594e56204. Blue annotations (on the right-side panel) are 12 hours into the deployment.

It's not so obvious on the image, but this is 1.2k sessions vs 800 sessions open against that same backend after 12 hours, so an actual 50% improvement.

Now... the diff between those 2 version is only e35463c767d693d24fc26b6575749e8594e56204 (on the quic side anyway), which makes no sense to me. Maybe it's sheer luck, maybe it can tell you something, I don't know but thought it might be interesting nonetheless.

wtarreau commented 1 year ago

Thanks Tristan. I continue to think that this is unrelated to QUIC. However what might be happening is that some QUIC slowdowns towards the client could indeed increase concurrency and affect the calculated number of average concurrent connections to the server, which is used as a target to increase/shrink the pool size. It should shrink, since that average reduceds following an exponential curve, and that's the strangest part here. I'm pretty sure that it indicates an issue somewhere else that might be triggered by an unexpected behavior in QUIC, but I have no idea what.

The next time the amount of connections is very high, it could be useful to send "show servers conn" and "show fd" in case we can spot a pattern there.

Tristan971 commented 1 year ago

I continue to think that this is unrelated to QUIC. […] I'm pretty sure that it indicates an issue somewhere else that might be triggered by an unexpected behavior in QUIC

That would make sense indeed

it could be useful to send "show servers conn" and "show fd" in case we can spot a pattern there

Noted, I’ll try and get that from time to time then

wtarreau commented 1 year ago

I'm wondering to what extent this could be related to this one: #1842 If some streams are not properly reported as failures, maybe some connections do not finish on the other side.

Tristan971 commented 1 year ago

It’s interesting that you mention it; from https://github.com/haproxy/haproxy/issues/1842#issuecomment-1229416839 maybe it’s relevant for me to say that this is a 206-heavy (ish) backend that is disproportionately affected

I’ll pull the data you suggested shortly

Tristan971 commented 1 year ago

Alright, here's some data

show servers conn

```plain $ socat - /run/haproxy/admin.sock <<< "show servers conn" # bkname/svname bkid/svid addr port - purge_delay used_cur used_max need_est unsafe_nb safe_nb idle_lim idle_cur idle_per_thr[8] api_varnish/local_varnish 25/1 127.0.0.1 8082 - 5000 4 11 17 0 13 -1 13 3 1 2 0 3 2 1 1 api_varnish/redacted_1 25/2 redacted.186 80 - 5000 0 1 1 0 0 -1 0 0 0 0 0 0 0 0 0 api_varnish/redacted_2 25/3 redacted.8 80 - 5000 0 1 1 0 0 -1 0 0 0 0 0 0 0 0 0 api_varnish_internal/local_varnish 26/1 127.0.0.1 8082 - 5000 0 1 1 0 0 -1 0 0 0 0 0 0 0 0 0 web_varnish/local_varnish 27/1 127.0.0.1 8082 - 5000 19 37 37 4 14 -1 18 0 17 0 1 0 0 0 0 opengraph_varnish/local_varnish 28/1 127.0.0.1 8082 - 5000 0 1 2 0 0 -1 0 0 0 0 0 0 0 0 0 uploads_covers/local_nginx 29/1 127.0.0.1 8081 - 5000 318 329 336 0 16 -1 16 9 4 0 0 0 1 0 2 uploads_covers/mtc-eu-par-f1 29/2 redacted.154 80 - 5000 317 327 333 0 16 -1 16 6 4 0 4 0 0 0 2 uploads_covers/mtc-eu-par-f2 29/3 redacted.155 80 - 5000 318 327 337 0 19 -1 19 8 7 0 0 0 2 0 2 uploads_chapters/mtc-eu-par-f1 30/1 redacted.154 80 - 5000 12 16 16 0 4 -1 4 0 0 3 1 0 0 0 0 uploads_chapters/mtc-eu-par-f2 30/2 redacted.155 80 - 5000 13 16 16 0 3 -1 3 0 0 2 0 0 0 0 1 uploads_chapters/image-origin-lb-1 30/3 redacted.54 80 - 5000 15 15 16 0 1 -1 1 0 0 0 0 0 0 0 1 uploads_chapters/image-origin-lb-2 30/4 redacted.55 80 - 5000 14 14 16 0 2 -1 2 0 0 1 0 0 0 0 1 uploads_chapters/image-origin-lb-3 30/5 redacted.53 80 - 5000 14 16 16 1 1 -1 2 0 1 0 0 0 0 1 0 uploads_chapters/image-origin-lb-4 30/6 redacted.56 80 - 5000 17 17 17 0 0 -1 0 0 0 0 0 0 0 0 0 uploads_chapters/image-origin-lb-5 30/7 redacted.52 80 - 5000 15 16 16 0 1 -1 1 0 0 0 0 1 0 0 0 uploads_chapters/image-origin-lb-6 30/8 redacted.51 80 - 5000 17 17 17 0 0 -1 0 0 0 0 0 0 0 0 0 uploads_chapters/image-origin-lb-7 30/9 - 80 - 5000 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 uploads_chapters/image-origin-lb-8 30/10 - 80 - 5000 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 uploads_generic/mtc-eu-par-f1 31/1 redacted.154 80 - 5000 0 1 1 0 0 -1 0 0 0 0 0 0 0 0 0 uploads_generic/mtc-eu-par-f2 31/2 redacted.155 80 - 5000 0 1 1 0 0 -1 0 0 0 0 0 0 0 0 0 uploads_generic/image-origin-lb-1 31/3 redacted.54 80 - 5000 0 1 1 0 0 -1 0 0 0 0 0 0 0 0 0 uploads_generic/image-origin-lb-2 31/4 redacted.55 80 - 5000 0 1 1 0 0 -1 0 0 0 0 0 0 0 0 0 uploads_generic/image-origin-lb-3 31/5 redacted.53 80 - 5000 0 1 1 0 0 -1 0 0 0 0 0 0 0 0 0 uploads_generic/image-origin-lb-4 31/6 redacted.56 80 - 5000 0 1 1 0 0 -1 0 0 0 0 0 0 0 0 0 uploads_generic/image-origin-lb-5 31/7 redacted.52 80 - 5000 0 1 2 0 0 -1 0 0 0 0 0 0 0 0 0 uploads_generic/image-origin-lb-6 31/8 redacted.51 80 - 5000 0 1 1 0 0 -1 0 0 0 0 0 0 0 0 0 uploads_generic/image-origin-lb-7 31/9 - 80 - 5000 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 uploads_generic/image-origin-lb-8 31/10 - 80 - 5000 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 plausible/redacted_1 32/1 redacted.134 80 - 5000 1 2 2 0 1 -1 1 0 1 0 0 0 0 0 0 plausible/redacted_2 32/2 redacted.135 80 - 5000 0 2 2 0 3 -1 3 0 1 0 0 0 1 1 0 sentry/redacted_1 33/1 redacted.134 80 - 5000 0 1 1 0 0 -1 0 0 0 0 0 0 0 0 0 sentry/redacted_2 33/2 redacted.135 80 - 5000 0 1 1 0 0 -1 0 0 0 0 0 0 0 0 0 mtech_docker/core-wrk-lb-1 34/1 redacted.31 80 - 5000 0 1 1 0 0 -1 0 0 0 0 0 0 0 0 0 mtech_docker/core-wrk-lb-2 34/2 redacted.32 80 - 5000 0 1 1 0 0 -1 0 0 0 0 0 0 0 0 0 mtech_docker/core-wrk-lb-3 34/3 - 80 - 5000 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 mtech_nxrm/core-wrk-lb-1 35/1 redacted.31 80 - 5000 0 1 1 0 0 -1 0 0 0 0 0 0 0 0 0 mtech_nxrm/core-wrk-lb-2 35/2 redacted.32 80 - 5000 0 1 1 0 0 -1 0 0 0 0 0 0 0 0 0 mtech_nxrm/core-wrk-lb-3 35/3 - 80 - 5000 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 proxy_danke_moe/local_varnish 36/1 127.0.0.1 8082 - 5000 0 1 2 0 0 -1 0 0 0 0 0 0 0 0 0 synth_gzip_bomb/local_nginx 51/1 127.0.0.1 8081 - 5000 0 1 1 0 0 -1 0 0 0 0 0 0 0 0 0 ```

show fd

wtarreau commented 1 year ago

Thanks for the dump. All of these FDs are connections to servers that have a response buffer full. This means that we got blocked processing the response, very likely transfering it to the client. What I don't understand is why these don't time out. Maybe we're seeing a stupid condition such as (for example) a retransmit that refreshes a timeout so that the timeout never expires or something in that vein. A "show sess all" filtering on them would help. Be careful there are many possibly private info in such a dump (IP etc), it may require some post-processing. If it's annoying, it could be useful to try to spot in such a dump some entries with "age=" followed by 3-4 digits (if these exist at all) and only provide 3-4 of them so that we can try to see a pattern emerge.

Tristan971 commented 1 year ago

Got it, that sounds like a probable cause indeed

dump some entries with "age=" followed by 3-4 digits (if these exist at all)

Well I have some sessions in there with 20+ hours of age :-)

Since 1 backend is overwhelmingly affected (uploads_covers) I've filtered for that one only; and further for show sess all I've kept only its sessions that were at least 10 hours old. Haven't made much stats on my side yet but here are the files, though the sessions have almost everything in common so pattern yes but I couldn't say what part of it is relevant

show fd

show servers conn

show sess all

(For reference, the instance I pulled this from currently has a total of 11'129 sessions, of which 10'729 are this specific backend)

wtarreau commented 1 year ago

Thanks Tristan, once again thanks to your captures we have a clue now. It seems there's a disagreement between the QUIC layer and the stream layer about the status of an aborted or timed out request. We're still investigating the exact impacts and how to best fix it (it may require quite a bit more thinking) but there's definitely something not clean there. Stay tuned!

Tristan971 commented 1 year ago

I saw some commits related to this issue, so I've just deployed ecb40b2 and will monitor the results (takes at least 24h to really see the impact for sure)

wtarreau commented 1 year ago

It should not be completely fixed, as we spent the day analyzing all this part and saw a few issues that we don't know how best to address without risking to break more than we fix, so we'll have to wait for Amaury's return in two weeks for this. But some parts could be slightly better though. I the mean time we're improving the traces and dumps so that it's easier to debug, because the QUIC part cannot be inspected via "show fd" and that doesn't help.

Tristan971 commented 1 year ago

Sounds about right for the preliminary data I see on my end (test LBs are in Europe, so traffic is indeed going down as it's the middle of the night here, but the magnitudes are too big to be just traffic patterns related)

Just checked and there are however some very real improvements already, so that's nice 👍

2022-09-02 04_13_51

We shall wait for Amaury's return either way for now 🙂

wtarreau commented 1 year ago

Excellent! Probably that with improved traces he'll have everything needed to figure how what to do on return ;-)

a-denoyelle commented 1 year ago

Hi Tristan,

We found several reasons which may explain your behavior but it's not easy to determine what is the root cause. Can you try the attached patch and give us your feedback please ? (.txt as extension to be able to link the file here). 0001-BUG-MEDIUM-mux-quic-reset-htx-buffer-in-snd_buf-on-t.txt

Tristan971 commented 1 year ago

I will give it a try and report back.

Will take at least a day to see the impact for sure however

Tristan971 commented 1 year ago

Unfortunately the patch causes a very quick segfault:

backtrace

```plain [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1". Core was generated by `/usr/sbin/haproxy -sf 3127 -x sockpair@3 -Ws -f /etc/haproxy/haproxy.cfg -p /ru'. Program terminated with signal SIGSEGV, Segmentation fault. #0 __eb64_insert (root=root@entry=0xf10, new=0x55e1dde3e540) at include/import/eb64tree.h:227 227 include/import/eb64tree.h: No such file or directory. [Current thread is 1 (Thread 0x7f69d319f040 (LWP 3168))] (gdb) t a a bt full Thread 8 (Thread 0x7f69d09af700 (LWP 3171)): #0 process_stream (t=t@entry=0x7f69c4513300, context=0x7f69c41b33d0, state=) at src/stream.c:1996 max_loops = 199 ana_back = 52 ana_list = 52 flags = 1078984706 s = 0x7f69c41b33d0 sess = 0x7f69c41f8ca0 req = 0x7f69c41b33f0 res = 0x7f69c41b3450 scf = 0x7f69c40e60c0 scb = 0x7f69c417e690 rqf_last = rpf_last = 2147483648 rq_prod_last = 8 rq_cons_last = rp_prod_last = 0 rp_cons_last = 8 srv = req_ana_back = rate = #1 0x000055e1db981e01 in run_tasks_from_lists (budgets=budgets@entry=0x7f69d098c510) at src/task.c:626 _ = { func = 0x55e1dbd2fc93 "run_tasks_from_lists", file = 0x55e1dbd2fbfe "src/task.c", line = 652, what = 5 '\005', arg8 = 0 '\000', arg32 = 0 } tl_queues = 0x7f69d098c4f8 budget_mask = 15 '\017' profile_entry = 0x0 done = 0 queue = 1 t = 0x7f69c4513300 process = ctx = state = 1048584 #2 0x000055e1db98293a in process_runnable_tasks () at src/task.c:855 max = {0, 89, 0, 0} tt = 0x55e1dc16fa00 heavy_queued = 1 default_weights = {64, 48, 16, 1} max_processed = 90 queue = 4 budget = 90 max_total = grq = lrq = gpicked = lpicked = t = tmp_list = #3 0x000055e1db94ef10 in run_poll_loop () at src/haproxy.c:2807 _ = { func = 0x55e1dbd259be "run_poll_loop", file = 0x55e1dbd259cc "src/haproxy.c", line = 2838, what = 1 '\001', arg8 = 0 '\000', arg32 = 0 } wake = next = #4 0x000055e1db952fcf in run_thread_poll_loop (data=) at src/haproxy.c:2996 init_left = 0 init_mutex = pthread_mutex_t = { Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None } init_cond = pthread_cond_t = { Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No } warn_fail = 0 warn_fail = 0 ptaf = ptif = ptdf = ptff = #5 0x00007f69d3728609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 No symbol table info available. #6 0x00007f69d344d133 in clone () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. Thread 7 (Thread 0x7f69d11b0700 (LWP 3170)): #0 0x00007f69d344d46e in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. #1 0x000055e1db7d9769 in _do_poll (p=, exp=989870211, wake=0) at src/ev_epoll.c:232 timeout = 145 updt_idx = fd = old_fd = wait_time = 145 status = count = #2 0x000055e1db94eede in run_poll_loop () at src/haproxy.c:2878 _ = { func = 0x55e1dbd259be "run_poll_loop", file = 0x55e1dbd259cc "src/haproxy.c", line = 2838, what = 1 '\001', arg8 = 0 '\000', arg32 = 0 } wake = next = -1140353680 #3 0x000055e1db952fcf in run_thread_poll_loop (data=) at src/haproxy.c:2996 init_left = 0 init_mutex = pthread_mutex_t = { Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None } init_cond = pthread_cond_t = { Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No } warn_fail = 0 warn_fail = 0 ptaf = ptif = ptdf = ptff = #4 0x00007f69d3728609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 No symbol table info available. #5 0x00007f69d344d133 in clone () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. Thread 6 (Thread 0x7f69ca7fc700 (LWP 3175)): #0 0x00007f69d344d46e in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. #1 0x000055e1db7d9769 in _do_poll (p=, exp=989870211, wake=0) at src/ev_epoll.c:232 timeout = 140 updt_idx = fd = old_fd = wait_time = 140 status = count = #2 0x000055e1db94eede in run_poll_loop () at src/haproxy.c:2878 _ = { func = 0x55e1dbd259be "run_poll_loop", file = 0x55e1dbd259cc "src/haproxy.c", line = 2838, what = 1 '\001', arg8 = 0 '\000', arg32 = 0 } wake = next = -1274571408 #3 0x000055e1db952fcf in run_thread_poll_loop (data=) at src/haproxy.c:2996 init_left = 0 init_mutex = pthread_mutex_t = { Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None } init_cond = pthread_cond_t = { Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No } warn_fail = 0 warn_fail = 0 ptaf = ptif = ptdf = ptff = #4 0x00007f69d3728609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 No symbol table info available. #5 0x00007f69d344d133 in clone () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. Thread 5 (Thread 0x7f69cb7fe700 (LWP 3173)): #0 0x00007f69d344d46e in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. #1 0x000055e1db7d9769 in _do_poll (p=, exp=989870094, wake=0) at src/ev_epoll.c:232 timeout = 13 updt_idx = fd = old_fd = wait_time = 13 status = count = #2 0x000055e1db94eede in run_poll_loop () at src/haproxy.c:2878 _ = { func = 0x55e1dbd259be "run_poll_loop", file = 0x55e1dbd259cc "src/haproxy.c", line = 2838, what = 1 '\001', arg8 = 0 '\000', arg32 = 0 } wake = next = -2012768912 #3 0x000055e1db952fcf in run_thread_poll_loop (data=) at src/haproxy.c:2996 init_left = 0 init_mutex = pthread_mutex_t = { Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None } init_cond = pthread_cond_t = { Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No } warn_fail = 0 warn_fail = 0 ptaf = ptif = ptdf = ptff = #4 0x00007f69d3728609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 No symbol table info available. #5 0x00007f69d344d133 in clone () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. Thread 4 (Thread 0x7f69caffd700 (LWP 3174)): #0 pool_evict_last_items (pool=pool@entry=0x55e1dd66ff40, ph=ph@entry=0x55e1dd670140, count=8) at src/pool.c:452 __ret = head = 0x0 released = 3 cluster = 0 to_free_max = 0 item = 0x55e1de027f90 pi = #1 0x000055e1db99875b in pool_evict_from_local_cache (pool=0x55e1dd66ff40, full=full@entry=0) at src/pool.c:499 ph = 0x55e1dd670140 #2 0x000055e1db998d38 in pool_put_to_cache (pool=0x0, ptr=, caller=) at src/pool.c:555 item = ph = #3 0x000055e1db8004e6 in qc_release_frm (qc=qc@entry=0x7f69b87a2fd0, frm=frm@entry=0x7f69b84c1c70) at src/xprt_quic.c:1545 __ptr = 0x7f69b84c1c70 origin = f = pn = tmp = #4 0x000055e1db80369d in quic_stream_try_to_consume (stream=, qc=) at src/xprt_quic.c:1618 strm = offset = len = frm = 0x7f69b84c1c70 ret = 1 frm_node = 0x7f69b87a5448 __x = __x = strm = offset = len = frm = __x = __x = __x = __x = #5 qc_treat_acked_tx_frm (frm=, qc=) at src/xprt_quic.c:1684 strm_frm = node = stream = len = offset = stream_acked = 1 __x = __x = strm_frm = node = stream = offset = len = __x = __x = __x = __x = __x = __x = __x = __x = __x = __x = qcc = __x = __x = #6 qc_ackrng_pkts (pkts=0x7f69b87a3960, newly_acked_pkts=0x7f69cafd9fa8, largest_node=, largest=, smallest=361, qc=, pkt_flags=) at src/xprt_quic.c:1731 frm = frmbak = 0x7f69b873ed90 node = 0x7f69b873ed60 pkt = 0x7f69b873ed40 __x = __x = frm = frmbak = __x = __x = __x = __x = #7 qc_parse_ack_frm (frm=0x7f69cafda080, qel=0x7f69b87a36b8, pos=0x7f69cafd9f20, end=0x7f69b81e36c6 "J\314B\212jF\032\254\324\266\020\233.\266b}#\024\270\236\031\301\002A@\001\001%\003'ss\356\313\324Tj\356\203\342\322%\270\023\226\254", qc=, rtt_sample=) at src/xprt_quic.c:2096 gap = ack_range = newly_acked_pkts = { n = 0x7f69b873ed40, p = 0x7f69b84c3db0 } lost_pkts = { n = 0x7f69cafda168, p = 0x7f69cafda168 } ret = 0 largest = smallest = 361 pkts = 0x7f69b87a3960 pkt_flags = 0 largest_node = time_sent = ack = __x = __x = __x = __x = __x = __x = __x = __x = __x = __x = gap = ack_range = __x = __x = __x = __x = __x = __x = __x = __x = __x = __x = __x = __x = #8 qc_parse_pkt_frms (pkt=0x7f69b878a940, ctx=0x7f69b87972b0, qel=0x7f69b87a36b8) at src/xprt_quic.c:2597 rtt_sample = 0 frm = { list = { n = 0x7f69cafda110, p = 0x7f69cafda1b8 }, pkt = 0x55e1dbb41d90 , type = 2 '\002', { padding = { len = 369 }, ack = { largest_ack = 369, ack_delay = 1, ack_range_num = 3, first_ack_range = 8 }, tx_ack = { ack_delay = 369, arngs = 0x1 }, crypto = { offset = 369, len = 1, qel = 0x3, data = 0x8 }, reset_stream = { id = 369, app_error_code = 1, final_size = 3 }, stop_sending = { id = 369, app_error_code = 1 }, new_token = { len = 369, data = 0x1 }, stream = { id = 369, stream = 0x1, buf = 0x3, offset = { node = { branches = { b = {0x8, 0x5} }, node_p = 0x55e1dbb41df7 , leaf_p = 0x0, bit = -24136, pfx = 51965 }, {}, key = 5 }, len = 140092353913272, fin = -889347756, data = 0x55e1dba7d59e "\205\300t\200H\213L$\020H\367\301" }, max_data = { max_data = 369 }, max_stream_data = { id = 369, max_stream_data = 1 }, max_streams_bidi = { max_streams = 369 }, max_streams_uni = { max_streams = 369 }, data_blocked = { limit = 369 }, stream_data_blocked = { id = 369, limit = 1 }, streams_blocked_bidi = { limit = 369 }, streams_blocked_uni = { limit = 369 }, new_connection_id = { seq_num = 369, retire_prior_to = 1, cid = { len = 3 '\003', data = 0x8 }, stateless_reset_token = 0x5 }, retire_connection_id = { seq_num = 369 }, path_challenge = { data = "q\001\000\000\000\000\000" }, path_challenge_response = { data = "q\001\000\000\000\000\000" }, connection_close = { error_code = 369, frame_type = 1, reason_phrase_len = 3, reason_phrase = "\b\000\000\000\000\000\000\000\005\000\000\000\000\000\000\000\367\035\264\333\341U\000\000\000\000\000\000\000\000\000\000\270\241\375\312i\177\000\000\005\000\000\000\000\000\000\000\270\241\375\312i\177\000\000T\241\375\312i\177\000" }, connection_close_app = { error_code = 369, reason_phrase_len = 1, reason_phrase = "\003\000\000\000\000\000\000\000\b\000\000\000\000\000\000\000\005\000\000\000\000\000\000\000\367\035\264\333\341U\000\000\000\000\000\000\000\000\000\000\270\241\375\312i\177\000\000\005\000\000\000\000\000\000\000\270\241\375\312i\177\000" } }, origin = 0x7f69b87960f0, reflist = { n = 0x55e1dbb41eb9 , p = 0x0 }, ref = { n = 0x55e1dba7dc0a , p = 0x0 }, flags = 3097973296 } qc = fast_retrans = 0 ret = 0 pos = 0x7f69b81e36c0 "\004\005\002?\003'J\314B\212jF\032\254\324\266\020\233.\266b}#\024\270\236\031\301\002A@\001\001%\003'ss\356\313\324Tj\356\203\342\322%\270\023\226\254" end = 0x7f69b81e36c6 "J\314B\212jF\032\254\324\266\020\233.\266b}#\024\270\236\031\301\002A@\001\001%\003'ss\356\313\324Tj\356\203\342\322%\270\023\226\254" __x = __x = __x = __x = stream = nb_streams = __x = __x = __x = __x = __x = __x = __x = __x = data = stop_sending = __x = __x = __x = __x = cf = cfdebug = __x = __x = cfdebug = diff = __x = __x = cfdebug = __x = __x = cfdebug = __x = __x = data = __x = __x = rtt_sample = ack_delay = _a = _b = __x = __x = __x = __x = __x = __x = __x = __x = iqel = hqel = __x = __x = #9 qc_treat_rx_pkts (cur_el=cur_el@entry=0x7f69b87a36b8, next_el=next_el@entry=0x0, ctx=ctx@entry=0x7f69b87972b0, force_ack=force_ack@entry=0) at src/xprt_quic.c:3829 pkt = 0x7f69b878a940 ret = 0 largest_pn = -1 largest_pn_time_received = 0 qc = 0x7f69b87a2fd0 qel = 0x7f69b87a36b8 node = 0x7f69b878aa08 #10 0x000055e1db807a16 in quic_conn_app_io_cb (t=0x7f69b8797410, context=0x7f69b87972b0, state=) at src/xprt_quic.c:4202 ctx = 0x7f69b87972b0 qc = 0x7f69b87a2fd0 qel = 0x7f69b87a36b8 #11 0x000055e1db981fbf in run_tasks_from_lists (budgets=budgets@entry=0x7f69cafda510) at src/task.c:590 _ = { func = 0x55e1dbd2fc93 "run_tasks_from_lists", file = 0x55e1dbd2fbfe "src/task.c", line = 652, what = 5 '\005', arg8 = 0 '\000', arg32 = 0 } tl_queues = 0x7f69cafda4f8 budget_mask = 15 '\017' profile_entry = 0x0 done = 1 queue = 0 t = 0x7f69b8797410 process = ctx = state = 3714515264 #12 0x000055e1db98293a in process_runnable_tasks () at src/task.c:855 max = {91, 0, 0, 0} tt = 0x55e1dc16fe80 heavy_queued = 1 default_weights = {64, 48, 16, 1} max_processed = 93 queue = 4 budget = 0 max_total = grq = lrq = gpicked = lpicked = t = tmp_list = #13 0x000055e1db94ef10 in run_poll_loop () at src/haproxy.c:2807 _ = { func = 0x55e1dbd259be "run_poll_loop", file = 0x55e1dbd259cc "src/haproxy.c", line = 2838, what = 1 '\001', arg8 = 0 '\000', arg32 = 0 } wake = next = #14 0x000055e1db952fcf in run_thread_poll_loop (data=) at src/haproxy.c:2996 init_left = 0 init_mutex = pthread_mutex_t = { Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None } init_cond = pthread_cond_t = { Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No } warn_fail = 0 warn_fail = 0 ptaf = ptif = ptdf = ptff = #15 0x00007f69d3728609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 No symbol table info available. #16 0x00007f69d344d133 in clone () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. Thread 3 (Thread 0x7f69cbfff700 (LWP 3172)): #0 0x00007f69d344d46e in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. #1 0x000055e1db7d9769 in _do_poll (p=, exp=989870211, wake=0) at src/ev_epoll.c:232 timeout = 129 updt_idx = fd = old_fd = wait_time = 129 status = count = #2 0x000055e1db94eede in run_poll_loop () at src/haproxy.c:2878 _ = { func = 0x55e1dbd259be "run_poll_loop", file = 0x55e1dbd259cc "src/haproxy.c", line = 2838, what = 1 '\001', arg8 = 0 '\000', arg32 = 0 } wake = next = -1073244816 #3 0x000055e1db952fcf in run_thread_poll_loop (data=) at src/haproxy.c:2996 init_left = 0 init_mutex = pthread_mutex_t = { Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None } init_cond = pthread_cond_t = { Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No } warn_fail = 0 warn_fail = 0 ptaf = ptif = ptdf = ptff = #4 0x00007f69d3728609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 No symbol table info available. #5 0x00007f69d344d133 in clone () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. Thread 2 (Thread 0x7f69d317b700 (LWP 3169)): #0 0x00007f69d344d46e in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. #1 0x000055e1db7d9769 in _do_poll (p=, exp=989870211, wake=0) at src/ev_epoll.c:232 timeout = 129 updt_idx = fd = old_fd = wait_time = 129 status = count = #2 0x000055e1db94eede in run_poll_loop () at src/haproxy.c:2878 _ = { func = 0x55e1dbd259be "run_poll_loop", file = 0x55e1dbd259cc "src/haproxy.c", line = 2838, what = 1 '\001', arg8 = 0 '\000', arg32 = 0 } wake = next = -871916448 #3 0x000055e1db952fcf in run_thread_poll_loop (data=) at src/haproxy.c:2996 init_left = 0 init_mutex = pthread_mutex_t = { Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None } init_cond = pthread_cond_t = { Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No } warn_fail = 0 warn_fail = 0 ptaf = ptif = ptdf = ptff = #4 0x00007f69d3728609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0 No symbol table info available. #5 0x00007f69d344d133 in clone () from /lib/x86_64-linux-gnu/libc.so.6 No symbol table info available. Thread 1 (Thread 0x7f69d319f040 (LWP 3168)): #0 __eb64_insert (root=root@entry=0xf10, new=0x55e1dde3e540) at include/import/eb64tree.h:227 side = 0 troot = root_right = newkey = old = old_node_bit = new_left = new_rght = new_leaf = old_leaf = new_left = new_rght = new_leaf = old_node = ret = #1 eb64_insert (root=root@entry=0xf10, new=0x55e1dde3e540) at src/eb64tree.c:27 No locals. #2 0x000055e1db8343d2 in qc_stream_desc_new (id=id@entry=94428570667136, type=type@entry=QCS_CLT_UNI, ctx=ctx@entry=0x55e1de0ae590, qc=0x0) at src/quic_stream.c:33 stream = 0x55e1dde3e540 #3 0x000055e1db823c71 in qcs_new (qcc=qcc@entry=0x55e1dd694700, id=94428570667136, type=QCS_CLT_UNI) at src/mux_quic.c:153 qc = 0x98 qcs = 0x55e1de0ae590 #4 0x000055e1db8239db in qcc_init_stream_local (qcc=0x55e1dd694700, bidi=bidi@entry=0) at src/mux_quic.c:563 next = 0x55e1dd694888 type = 3725256080 qcs = #5 0x000055e1db82f63c in h3_finalize (ctx=0x55e1dddf2b30) at src/h3.c:1135 h3c = 0x55e1dddf2b30 qcs = #6 0x000055e1db827e74 in qc_send (qcc=qcc@entry=0x55e1dde74170) at src/mux_quic.c:1614 frms = { n = 0x7ffda3848818, p = 0x7ffda3848818 } total = 0 tmp_total = 0 node = qcs = qcs_tmp = #7 0x000055e1db828784 in qc_release (qcc=qcc@entry=0x55e1dde74170) at src/mux_quic.c:1808 conn = 0x55e1dddf5170 node = #8 0x000055e1db82bc5c in qc_timeout_task (t=0x55e1dde3e540, ctx=0x55e1dde74170, state=) at src/mux_quic.c:1932 expired = qcc = #9 0x000055e1db982076 in run_tasks_from_lists (budgets=budgets@entry=0x7ffda3848af0) at src/task.c:628 _ = { func = 0x55e1dbd2fc93 "run_tasks_from_lists", file = 0x55e1dbd2fbfe "src/task.c", line = 652, what = 5 '\005', arg8 = 0 '\000', arg32 = 0 } tl_queues = 0x7ffda3848ad8 budget_mask = 15 '\017' profile_entry = 0x0 done = 0 queue = 1 t = 0x55e1dde3e540 process = ctx = state = 3722700096 #10 0x000055e1db98293a in process_runnable_tasks () at src/task.c:855 max = {0, 92, 0, 0} tt = 0x55e1dc16f580 heavy_queued = 1 default_weights = {64, 48, 16, 1} max_processed = 93 queue = 4 budget = 93 max_total = grq = lrq = gpicked = lpicked = t = tmp_list = #11 0x000055e1db94ef10 in run_poll_loop () at src/haproxy.c:2807 _ = { func = 0x55e1dbd259be "run_poll_loop", file = 0x55e1dbd259cc "src/haproxy.c", line = 2838, what = 1 '\001', arg8 = 0 '\000', arg32 = 0 } wake = next = #12 0x000055e1db952fcf in run_thread_poll_loop (data=) at src/haproxy.c:2996 init_left = 0 init_mutex = pthread_mutex_t = { Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None } init_cond = pthread_cond_t = { Threads known to still execute a wait function = 0, Clock ID = CLOCK_REALTIME, Shared = No } warn_fail = 0 warn_fail = 0 ptaf = ptif = ptdf = ptff = #13 0x000055e1db952547 in main (argc=, argv=0x7ffda3848fc8) at src/haproxy.c:3645 limit = { rlim_cur = 18446744073709551615, rlim_max = 18446744073709551615 } pidfd = retry = err = intovf = (gdb) ```

edit: nevermind, it's not related to the patch but to something between 2f105b8a457625491f06c40ddea8d43b1ed5c561 and 243c2d18221b36b087ad9c177293306119f3fafd

a-denoyelle commented 1 year ago

I suspect this crash is linked to the following patch : 3dd79d378c86b3ebf60e029f518add5f1ed54815

I already experecienced crashes on my machine related to it. Not the same backtrace, but not far from it. Maybe you can revert it and re-test with the other patch I have provided so that we can have a feedback on the leak issue. In the meantime I'll try to provide a proper fix to this crash.

Tristan971 commented 1 year ago

Nice if you're already aware of that one then 👍

Made a build with it reverted + your patch and deployed that; we'll know in the evening

edit: just to confirm, the crash was indeed caused (or revealed, at least) by 3dd79d378c86b3ebf60e029f518add5f1ed54815 (since I went from crashing in <1min to ~20 minutes up so far)

Tristan971 commented 1 year ago

@a-denoyelle The patch certainly looks weird, but it did the job. At least it catches the vast majority of the connections that were kept open before

wtarreau commented 1 year ago

That's great news! The patch doesn't look obvious like this but it makes sure that aborted transfers result in the data being marked as consumed, otherwise the upper stream ends up in a non-consistent state, with data in the buffer while it was told they were sent. It's then easy to imagine that some of the abort conditions will never be met. It's not easy to figure exactly the complex sequence of events that results in this situation, most likely a very specific timing is required, but your graphs prove that it did happen since it made a difference!

Tristan971 commented 1 year ago

I see, that makes sense indeed. Well, after 24 hours or so, I can confirm that we weren't just lucky and there really is no more leak at all.

I'll do a build with f8aaf8bdfa40e21b1a2f600c3ed6455bf9b6a763 to see if that fixes the other crash entirely I guess, but now I'm looking forward to 2.6.6 and to finally see how QUIC does in full-blast in prod :-)

Tristan971 commented 1 year ago

I'll do a build with f8aaf8bdfa40e21b1a2f600c3ed6455bf9b6a763 to see if that fixes the other crash entirely I guess

It does. 48 hours with it + the patch and we're still good 👍

a-denoyelle commented 1 year ago

@Tristan971 FYI I have merged the final patch which should resolve this issue. If you have the time you can test it on the last master tip.

Also, as a side-note not related to QUIC, we have recently integrated a mode to automatically anonymize information for bug reports via a new CLI command. We are open to users feedback and as I saw that you manually hide fields on bug reports, do not hesitate to test it and give us your thoughts on it. Thanks again :)

Tristan971 commented 1 year ago

Nice, I'll give master's head a try!

And yeah I noticed the commits for anonymization; much appreciated as it was a little stressful looking for anything I might have missed :-)

Tristan971 commented 1 year ago

Thanks Github 🤦 /s

a-denoyelle commented 1 year ago

Closed with backport done.

haproxy / haproxy