varnishcache / varnish-cache

Varnish Cache source code repository
https://www.varnish-cache.org
Other
3.56k stars 365 forks source link

Intermittent timeout in r02310.vtc #4098

Closed stevendore closed 2 months ago

stevendore commented 2 months ago

Expected Behavior

Test to pass consistently.

Current Behavior

Test fails after about 30 consecutive runs.

VTC output

**** dT 0.000 * top TEST tests/r02310.vtc starting **** top extmacro def pkg_version=trunk **** top extmacro def pkg_branch=trunk **** top extmacro def pwd=/home/swojcik/code/varnish-cache/bin/varnishtest **** top extmacro def date(...) **** top extmacro def string(...) **** top extmacro def localhost=127.0.0.1 **** top extmacro def bad_backend=127.0.0.1:34979 **** top extmacro def listen_addr=127.0.0.1:0 **** top extmacro def bad_ip=192.0.2.255 **** top extmacro def topbuild=/home/swojcik/code/varnish-cache **** top extmacro def topsrc=/home/swojcik/code/varnish-cache **** top macro def testdir=/home/swojcik/code/varnish-cache/bin/varnishtest/tests **** top macro def tmpdir=/tmp/vtc.1323156.6e021e02 **** top macro def vtcid=vtc.1323156.6e021e02 ** top === varnishtest "#2310: Panic on premature hangup after Upgrade:... * top VTEST #2310: Panic on premature hangup after Upgrade: h2c ** top === barrier b1 cond 2 ** top === server s1 { ** s1 Starting server **** s1 macro def s1_addr=127.0.0.1 **** s1 macro def s1_port=40005 **** s1 macro def s1_sock=127.0.0.1:40005 * s1 Listen on 127.0.0.1:40005 ** top === varnish v1 -vcl+backend {} -start **** dT 0.001 ** s1 Started on 127.0.0.1:40005 (1 iterations) **** dT 0.005 ** v1 Launch *** v1 CMD: cd ${pwd} && exec varnishd -d -n /tmp/vtc.1323156.6e021e02/v1 -i v1 -l 2m -p auto_restart=off -p syslog_cli_traffic=off -p thread_pool_min=10 -p debug=+vtc_mode -p vsl_mask=+Debug,+H2RxHdr,+H2RxBody -p h2_initial_window_size=1m -p h2_rx_window_low_water=64k -a '127.0.0.1:0' -M '127.0.0.1 43565' -P /tmp/vtc.1323156.6e021e02/v1/varnishd.pid -p vmod_path=/home/swojcik/code/varnish-cache/vmod/.libs *** v1 CMD: cd /home/swojcik/code/varnish-cache/bin/varnishtest && exec varnishd -d -n /tmp/vtc.1323156.6e021e02/v1 -i v1 -l 2m -p auto_restart=off -p syslog_cli_traffic=off -p thread_pool_min=10 -p debug=+vtc_mode -p vsl_mask=+Debug,+H2RxHdr,+H2RxBody -p h2_initial_window_size=1m -p h2_rx_window_low_water=64k -a '127.0.0.1:0' -M '127.0.0.1 43565' -P /tmp/vtc.1323156.6e021e02/v1/varnishd.pid -p vmod_path=/home/swojcik/code/varnish-cache/vmod/.libs **** dT 0.006 *** v1 PID: 1329417 **** v1 macro def v1_pid=1329417 **** v1 macro def v1_name=/tmp/vtc.1323156.6e021e02/v1 **** dT 0.023 *** v1 debug|Debug: Version: varnish-trunk revision 08d4e8045502c5c32796c62c6bb2574853a87e12 *** v1 debug|Debug: Platform: Linux,6.5.0-25-generic,x86_64,-jnone,-sdefault,-sdefault,-hcritbit *** v1 debug|200 317 *** v1 debug|----------------------------- *** v1 debug|Varnish Cache CLI 1.0 *** v1 debug|----------------------------- *** v1 debug|Linux,6.5.0-25-generic,x86_64,-jnone,-sdefault,-sdefault,-hcritbit *** v1 debug|varnish-trunk revision 08d4e8045502c5c32796c62c6bb2574853a87e12 *** v1 debug| *** v1 debug|Type 'help' for command list. *** v1 debug|Type 'quit' to close CLI session. *** v1 debug|Type 'start' to launch worker process. *** v1 debug| **** dT 0.122 **** v1 CLIPOLL 1 0x1 0x0 0x0 *** v1 CLI connection fd = 6 *** v1 CLI RX 107 **** v1 CLI RX|dalsnjzawudqnmifecygvohfiezowxvd **** v1 CLI RX| **** v1 CLI RX|Authentication required. **** v1 CLI TX|auth 6fddc5692756e19cfad909b77a57d249f0c160993b95878771ef1a2545eb7757 *** v1 CLI RX 200 **** v1 CLI RX|----------------------------- **** v1 CLI RX|Varnish Cache CLI 1.0 **** v1 CLI RX|----------------------------- **** v1 CLI RX|Linux,6.5.0-25-generic,x86_64,-jnone,-sdefault,-sdefault,-hcritbit **** v1 CLI RX|varnish-trunk revision 08d4e8045502c5c32796c62c6bb2574853a87e12 **** v1 CLI RX| **** v1 CLI RX|Type 'help' for command list. **** v1 CLI RX|Type 'quit' to close CLI session. **** v1 CLI RX|Type 'start' to launch worker process. **** v1 CLI TX|vcl.inline vcl1 << %XJEIFLH|)Xspa8P **** v1 CLI TX|vcl 4.1; **** v1 CLI TX|backend s1 { .host = "127.0.0.1"; .port = "40005"; } **** v1 CLI TX| **** v1 CLI TX| **** v1 CLI TX|%XJEIFLH|)Xspa8P **** dT 0.223 *** v1 vsl|No VSL chunk found (child not started ?) **** dT 0.291 *** v1 CLI RX 200 **** v1 CLI RX|VCL compiled. **** v1 CLI TX|vcl.use vcl1 *** v1 CLI RX 200 **** v1 CLI RX|VCL 'vcl1' now active ** v1 Start **** v1 CLI TX|start **** dT 0.323 *** v1 vsl|No VSL chunk found (child not started ?) **** dT 0.335 *** v1 debug|Debug: Child (1329535) Started **** dT 0.364 *** v1 debug|Child launched OK *** v1 CLI RX 200 *** v1 wait-running **** v1 CLI TX|status *** v1 debug|Info: Child (1329535) said Child starts **** dT 0.408 *** v1 CLI RX 200 **** v1 CLI RX|Child in state running **** v1 CLI TX|debug.listen_address **** dT 0.423 **** v1 vsl| 0 CLI - Rd vcl.load "vcl1" vcl_vcl1.1714415774.668483/vgc.so 1auto **** v1 vsl| 0 CLI - Wr 200 52 Loaded "vcl_vcl1.1714415774.668483/vgc.so" as "vcl1" **** v1 vsl| 0 CLI - Rd vcl.use "vcl1" **** v1 vsl| 0 CLI - Wr 200 0 **** v1 vsl| 0 CLI - Rd start **** v1 vsl| 0 Debug - sockopt: Setting SO_LINGER for a0=127.0.0.1:35363 **** v1 vsl| 0 Debug - sockopt: Setting SO_KEEPALIVE for a0=127.0.0.1:35363 **** v1 vsl| 0 Debug - sockopt: Setting SO_SNDTIMEO for a0=127.0.0.1:35363 **** v1 vsl| 0 Debug - sockopt: Setting SO_RCVTIMEO for a0=127.0.0.1:35363 **** v1 vsl| 0 Debug - sockopt: Setting TCP_NODELAY for a0=127.0.0.1:35363 **** v1 vsl| 0 Debug - sockopt: Setting TCP_KEEPIDLE for a0=127.0.0.1:35363 **** v1 vsl| 0 Debug - sockopt: Setting TCP_KEEPCNT for a0=127.0.0.1:35363 **** v1 vsl| 0 Debug - sockopt: Setting TCP_KEEPINTVL for a0=127.0.0.1:35363 **** v1 vsl| 0 CLI - Wr 200 0 **** dT 0.452 *** v1 CLI RX 200 **** v1 CLI RX|a0 127.0.0.1 35363 **** v1 CLI TX|debug.xid 1000 **** dT 0.495 *** v1 CLI RX 200 **** v1 CLI RX|XID is 1000 chunk 1 **** v1 CLI TX|debug.listen_address **** dT 0.523 **** v1 vsl| 0 CLI - Rd debug.listen_address **** v1 vsl| 0 CLI - Wr 200 19 a0 127.0.0.1 35363 **** v1 vsl| 0 CLI - Rd debug.xid 1000 **** v1 vsl| 0 CLI - Wr 200 19 XID is 1000 chunk 1 **** dT 0.540 *** v1 CLI RX 200 **** v1 CLI RX|a0 127.0.0.1 35363 ** v1 Listen on 127.0.0.1 35363 **** v1 macro def v1_addr=127.0.0.1 **** v1 macro def v1_port=35363 **** v1 macro def v1_sock=127.0.0.1:35363 **** v1 macro def v1_a0_addr=127.0.0.1 **** v1 macro def v1_a0_port=35363 **** v1 macro def v1_a0_sock=127.0.0.1:35363 ** top === varnish v1 -cliok "param.set feature +http2" **** v1 CLI TX|param.set feature +http2 **** dT 0.587 *** v1 CLI RX 200 ** v1 CLI 200 ** top === client c1 { **** dT 0.588 ** c1 Starting client ** c1 Waiting for client ** c1 Started on 127.0.0.1:35363 (1 iterations) *** c1 Connect to 127.0.0.1:35363 *** c1 connected fd 16 from 127.0.0.1 54118 to 127.0.0.1:35363 ** c1 === send "GET / HTTP/1.1\r\n" **** c1 send|GET / HTTP/1.1\r ** c1 === send "Host: foo\r\n" **** c1 send|Host: foo\r ** c1 === send "Connection: Upgrade, HTTP2-Settings\r\n" **** c1 send|Connection: Upgrade, HTTP2-Settings\r ** c1 === send "Upgrade: h2c\r\n" **** c1 send|Upgrade: h2c\r ** c1 === send "HTTP2-Settings: AAMAAABkAARAAAAA\r\n" **** c1 send|HTTP2-Settings: AAMAAABkAARAAAAA\r ** c1 === send "\r\n" **** c1 send|\r ** c1 === rxresp **** c1 rxhdr|HTTP/1.1 101 Switching Protocols\r **** c1 rxhdr|Connection: Upgrade\r **** c1 rxhdr|Upgrade: h2c\r **** c1 rxhdr|\r **** c1 rxhdrlen = 71 **** c1 http[ 0] |HTTP/1.1 **** c1 http[ 1] |101 **** c1 http[ 2] |Switching Protocols **** c1 http[ 3] |Connection: Upgrade **** c1 http[ 4] |Upgrade: h2c **** c1 bodylen = 0 ** c1 === expect resp.status == 101 **** c1 EXPECT resp.status (101) == "101" match ** c1 === expect resp.http.upgrade == h2c **** c1 EXPECT resp.http.upgrade (h2c) == "h2c" match ** c1 === expect resp.http.connection == Upgrade **** c1 EXPECT resp.http.connection (Upgrade) == "Upgrade" match ** c1 === txpri **** c1 txpri|PRI * HTTP/2.0\r **** c1 txpri|\r **** c1 txpri|SM\r **** c1 txpri|\r **** dT 0.598 ** c1 === stream 0 { ** c1 Starting stream 0 (0x7b8cb8001730) ** c1 Waiting for stream 0 ** c1.0 === rxsettings *** c1 rx: stream: 0, type: SETTINGS (4), flags: 0x00, size: 18 **** c1.0 settings->MAX_CONCURRENT_STREAMS (3): 100 **** c1.0 settings->INITIAL_WINDOW_SIZE (4): 1048576 **** c1.0 settings->MAX_HEADER_LIST_SIZE (6): 32768 ** c1.0 === txsettings *** c1.0 tx: stream: 0, type: SETTINGS (4), flags: 0x00, size: 0 ** c1.0 === txsettings -ack *** c1.0 tx: stream: 0, type: SETTINGS (4), flags: 0x01, size: 0 ** c1.0 === rxsettings *** c1 rx: stream: 0, type: SETTINGS (4), flags: 0x01, size: 0 *** c1 flag: ACK ** c1.0 === expect settings.ack == true **** c1.0 EXPECT settings.ack (true) == "true" match ** c1.0 Ending stream 0 *** c1 closing fd 16 ** c1 Ending **** dT 0.599 ** top === barrier b1 sync **** top Barrier(b1) wait 1 of 2 **** dT 0.623 **** v1 vsl| 0 CLI - Rd debug.listen_address **** v1 vsl| 0 CLI - Wr 200 19 a0 127.0.0.1 35363 **** v1 vsl| 1000 Begin c sess 0 HTTP/1 **** v1 vsl| 1000 SessOpen c 127.0.0.1 54118 a0 127.0.0.1 35363 1714415775.133794 19 **** v1 vsl| 1000 Debug c sockopt: SO_LINGER may be inherited for a0=127.0.0.1:35363 **** v1 vsl| 1000 Debug c sockopt: SO_KEEPALIVE may be inherited for a0=127.0.0.1:35363 **** v1 vsl| 1000 Debug c sockopt: SO_SNDTIMEO may be inherited for a0=127.0.0.1:35363 **** v1 vsl| 1000 Debug c sockopt: SO_RCVTIMEO may be inherited for a0=127.0.0.1:35363 **** v1 vsl| 1000 Debug c sockopt: TCP_NODELAY may be inherited for a0=127.0.0.1:35363 **** v1 vsl| 1000 Debug c sockopt: TCP_KEEPIDLE may be inherited for a0=127.0.0.1:35363 **** v1 vsl| 1000 Debug c sockopt: TCP_KEEPCNT may be inherited for a0=127.0.0.1:35363 **** v1 vsl| 1000 Debug c sockopt: TCP_KEEPINTVL may be inherited for a0=127.0.0.1:35363 **** v1 vsl| 1000 Link c req 1001 rxreq **** dT 0.723 **** v1 vsl| 1001 Begin c req 1000 rxreq **** v1 vsl| 1001 Timestamp c Start: 1714415775.133821 0.000000 0.000000 **** v1 vsl| 1001 Timestamp c Req: 1714415775.133821 0.000000 0.000000 **** v1 vsl| 1001 Debug c H2 Optimistic Upgrade **** v1 vsl| 1001 Debug c H2CS AAMAAABkAARAAAAA **** v1 vsl| 1001 ReqUnset c Upgrade: h2c **** v1 vsl| 1001 ReqUnset c HTTP2-Settings: AAMAAABkAARAAAAA **** v1 vsl| 1001 ReqProtocol c HTTP/2.0 **** v1 vsl| 1001 VCL_use c vcl1 **** v1 vsl| 1001 ReqStart c 127.0.0.1 54118 a0 **** v1 vsl| 1001 ReqMethod c GET **** v1 vsl| 1001 ReqURL c / **** v1 vsl| 1001 ReqProtocol c HTTP/2.0 **** v1 vsl| 1001 ReqHeader c Host: foo **** v1 vsl| 1001 ReqHeader c Connection: Upgrade, HTTP2-Settings **** v1 vsl| 1001 ReqHeader c X-Forwarded-For: 127.0.0.1 **** v1 vsl| 1001 ReqHeader c Via: 1.1 v1 (Varnish/trunk) **** v1 vsl| 1001 VCL_call c RECV **** v1 vsl| 1001 VCL_return c hash **** v1 vsl| 1001 Timestamp c Reset: 1714415775.144540 0.010719 0.010719 **** v1 vsl| 1001 RespProtocol c HTTP/1.1 **** v1 vsl| 1001 RespStatus c 408 **** v1 vsl| 1001 RespReason c Client disconnected **** v1 vsl| 1001 RespHeader c Date: Mon, 29 Apr 2024 18:36:15 GMT **** v1 vsl| 1001 RespHeader c Server: Varnish **** v1 vsl| 1001 RespHeader c X-Varnish: 1001 **** v1 vsl| 1001 Timestamp c Process: 1714415775.144641 0.010819 0.000100 **** v1 vsl| 1001 RespProtocol c HTTP/2.0 **** v1 vsl| 1001 RespStatus c 408 **** v1 vsl| 1001 RespReason c Request Timeout **** v1 vsl| 1001 Timestamp c Resp: 1714415775.144649 0.010828 0.000008 **** v1 vsl| 1001 ReqAcct c 114 0 114 0 0 0 **** v1 vsl| 1001 End c **** dT 0.823 **** v1 vsl| 1000 Debug c H2SETTING max_concurrent_streams=0x00000064 **** v1 vsl| 1000 Debug c H2SETTING initial_window_size=0x40000000 **** v1 vsl| 1000 Debug c H2: Got pu PRISM **** v1 vsl| 1000 H2RxHdr c [000000040000000000] **** v1 vsl| 1000 Debug c H2RXF SETTINGS[0] 0x00 0x00000000 **** v1 vsl| 1000 H2RxHdr c [000000040100000000] **** v1 vsl| 1000 Debug c H2RXF SETTINGS[0] 0x01 0x00000000 **** v1 vsl| 1000 Debug c H2 CLEANUP ENHANCE_YOUR_CALM **** v1 vsl| 1000 Debug c KILL st=1 state=5 sched=1 **** v1 vsl| 1000 Debug c ST 0 0 **** v1 vsl| 1000 Debug c ST 1 5 **** v1 vsl| 1000 Debug c KILL st=1 state=6 sched=0 **** v1 vsl| 1000 Debug c ST 0 0 **** v1 vsl| 1000 ReqAcct c 18 0 18 27 26 53 **** v1 vsl| 1000 SessClose c OVERLOAD 0.211 **** v1 vsl| 1000 End c **** dT 3.426 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415777 1.0 **** dT 6.430 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415780 1.0 **** dT 9.432 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415783 1.0 **** dT 12.436 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415786 1.0 **** dT 15.438 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415789 1.0 **** dT 18.442 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415792 1.0 **** dT 21.446 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415795 1.0 **** dT 24.450 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415798 1.0 **** dT 27.454 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415801 1.0 **** dT 30.458 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415804 1.0 **** dT 33.461 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415807 1.0 **** dT 36.465 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415810 1.0 **** dT 39.468 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415813 1.0 **** dT 42.371 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415816 1.0 **** dT 45.375 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415819 1.0 **** dT 48.379 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415822 1.0 **** dT 51.382 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415825 1.0 **** dT 54.386 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415828 1.0 **** dT 57.389 **** v1 vsl| 0 CLI - Rd ping **** v1 vsl| 0 CLI - Wr 200 19 PONG 1714415831 1.0 # top TEST tests/r02310.vtc TIMED OUT (kill -9) # top TEST tests/r02310.vtc FAILED (60.003) signal=9

Possible Solution

A delay after txpri passed all runs but I am unsure if that is really the desired result.

Steps to Reproduce (for bugs)

  1. check out master (though i've test each of 7.x and found it happened)
  2. compile varnish
  3. ./bin/varnishtest/varnishtest -i ./bin/varnishtest/tests/r02310.vtc -j 4 -n 400

Context

Testing Varnish

Varnish Cache version

varnishd (varnish-trunk revision 08d4e8045502c5c32796c62c6bb2574853a87e12)

Operating system

Ubuntu 22.04

Source of binary packages used (if any)

git clone

walid-git commented 2 months ago

The test is racy, since #3998 we no longer perform the backend fetch if the client disconnects before it has started. Thus, the test will be stuck at this line until it times out since the other barrier b1 sync will never be reached.

Adding varnish v1 -cliok "param.set feature -vcl_req_reset" should be a sufficient fix.

dridi commented 2 months ago

Please https://github.com/varnishcache/varnish-cache/pull/4099 a try.

stevendore commented 2 months ago

Solved my issues. Thanks.