varnishcache / varnish-cache

Varnish Cache source code repository
https://www.varnish-cache.org
Other
3.56k stars 365 forks source link

varnish-7.5.0: Regressions on ppc64le, aarch64, ppc32 #4086

Open ingvarha opened 3 months ago

ingvarha commented 3 months ago

Expected Behavior

Varnish should run the whole test/regression suite on all platforms

Current Behavior

Several tests in the test suite fail. Example: 01878.vtc

ingvar@teie:~/src/varnish-7.5.0/bin/varnishtest$ uname -a
Linux teie 6.0.0-1-powerpc #1 Debian 6.0.2-1 (2022-10-20) ppc GNU/Linux

ingvar@teie:~/src/varnish-7.5.0/bin/varnishtest$ ./varnishtest -i tests/r01878.vtc
**** dT    0.000
*    top   TEST tests/r01878.vtc starting
**** top   extmacro def pkg_version=7.5.0
**** top   extmacro def pkg_branch=7.5
**** top   extmacro def pwd=/home/ingvar/src/varnish-7.5.0/bin/varnishtest
**** top   extmacro def date(...)
**** top   extmacro def string(...)
**** top   extmacro def localhost=127.0.0.1
**** top   extmacro def bad_backend=127.0.0.1:41609
**** top   extmacro def listen_addr=127.0.0.1:0
**** top   extmacro def bad_ip=192.0.2.255
**** top   extmacro def topbuild=/home/ingvar/src/varnish-7.5.0
**** top   extmacro def topsrc=/home/ingvar/src/varnish-7.5.0
**** top   macro def testdir=/home/ingvar/src/varnish-7.5.0/bin/varnishtest/tests
**** top   macro def tmpdir=/tmp/vtc.1310.3ba018aa
**** top   macro def vtcid=vtc.1310.3ba018aa
**** dT    0.001
**   top   === varnishtest "ESI delivering a gzip'd object works when paren...
*    top   VTEST ESI delivering a gzip'd object works when parent is not gzip'd
**   top   === server s1 {
**   s1    Starting server
**** s1    macro def s1_addr=127.0.0.1
**** s1    macro def s1_port=43877
**** s1    macro def s1_sock=127.0.0.1:43877
*    s1    Listen on 127.0.0.1:43877
**   top   === varnish v1 -cliok "param.set thread_pool_stack 80k"
**** dT    0.002
**   s1    Started on 127.0.0.1:43877 (1 iterations)
**** dT    0.024
**   v1    Launch
***  v1    CMD: cd ${pwd} && exec varnishd  -d -n /tmp/vtc.1310.3ba018aa/v1 -i v1 -l 2m -p auto_restart=off -p syslog_cli_traffic=off -p thread_pool_min=10 -p debug=+vtc_mode -p vsl_mask=+Debug,+H2RxHdr,+H2RxBody -p h2_initial_window_size=1m -p h2_rx_window_low_water=64k -a '127.0.0.1:0' -M '127.0.0.1 45027' -P /tmp/vtc.1310.3ba018aa/v1/varnishd.pid -p vmod_path=/home/ingvar/src/varnish-7.5.0/vmod/.libs 
***  v1    CMD: cd /home/ingvar/src/varnish-7.5.0/bin/varnishtest && exec varnishd  -d -n /tmp/vtc.1310.3ba018aa/v1 -i v1 -l 2m -p auto_restart=off -p syslog_cli_traffic=off -p thread_pool_min=10 -p debug=+vtc_mode -p vsl_mask=+Debug,+H2RxHdr,+H2RxBody -p h2_initial_window_size=1m -p h2_rx_window_low_water=64k -a '127.0.0.1:0' -M '127.0.0.1 45027' -P /tmp/vtc.1310.3ba018aa/v1/varnishd.pid -p vmod_path=/home/ingvar/src/varnish-7.5.0/vmod/.libs 
**** dT    0.025
***  v1    PID: 1329
**** v1    macro def v1_pid=1329
**** v1    macro def v1_name=/tmp/vtc.1310.3ba018aa/v1
**** dT    0.124
***  v1    debug|Debug: Version: varnish-7.5.0 revision eef25264e5ca5f96a77129308edb83ccf84cb1b1
***  v1    debug|Debug: Platform: Linux,6.0.0-1-powerpc,ppc,-jnone,-sdefault,-sdefault,-hcritbit
**** dT    0.125
***  v1    debug|200 313     
***  v1    debug|-----------------------------
***  v1    debug|Varnish Cache CLI 1.0
***  v1    debug|-----------------------------
***  v1    debug|Linux,6.0.0-1-powerpc,ppc,-jnone,-sdefault,-sdefault,-hcritbit
***  v1    debug|varnish-7.5.0 revision eef25264e5ca5f96a77129308edb83ccf84cb1b1
***  v1    debug|
***  v1    debug|Type 'help' for command list.
***  v1    debug|Type 'quit' to close CLI session.
***  v1    debug|Type 'start' to launch worker process.
***  v1    debug|
**** dT    0.210
**** v1    CLIPOLL 1 0x1 0x0 0x0
***  v1    CLI connection fd = 6
**** dT    0.211
***  v1    CLI RX  107
**** v1    CLI RX|pmozhirlsfvbsfxwmaxidsntatlhpear
**** v1    CLI RX|
**** v1    CLI RX|Authentication required.
**** dT    0.212
**** v1    CLI TX|auth 1a0149d031a7c6adfbcdd7378e34045f036fe3b0b4da9e52b1b3bb501b233948
**** dT    0.213
***  v1    CLI RX  200
**** v1    CLI RX|-----------------------------
**** v1    CLI RX|Varnish Cache CLI 1.0
**** v1    CLI RX|-----------------------------
**** v1    CLI RX|Linux,6.0.0-1-powerpc,ppc,-jnone,-sdefault,-sdefault,-hcritbit
**** v1    CLI RX|varnish-7.5.0 revision eef25264e5ca5f96a77129308edb83ccf84cb1b1
**** v1    CLI RX|
**** v1    CLI RX|Type 'help' for command list.
**** v1    CLI RX|Type 'quit' to close CLI session.
**** v1    CLI RX|Type 'start' to launch worker process.
**** dT    0.214
**** v1    CLI TX|param.set thread_pool_stack 80k
**** dT    0.256
***  v1    CLI RX  106
**** v1    CLI RX|Must be at least 128k
**** v1    CLI RX|
**** v1    CLI RX|(attempting to set param 'thread_pool_stack' to '80k')
**   v1    CLI 106 <param.set thread_pool_stack 80k>
---- v1    FAIL CLI response 106 expected 200
*    top   RESETTING after tests/r01878.vtc
**   s1    Waiting for server (3/-1)
**** dT    0.257
**   v1    Wait
**** v1    CLI TX|panic.show
**** dT    0.300
***  v1    CLI RX  300
**** v1    CLI RX|Child has not panicked or panic has been cleared
***  v1    debug|Info: manager stopping child
***  v1    debug|Info: manager dies
**** dT    0.304
**** v1    STDOUT EOF
**** dT    0.320
***  v1    vsl|No VSL chunk found (child not started ?)
**   v1    WAIT4 pid=1329 status=0x0000 (user 0.015077 sys 0.040061)
*    top   TEST tests/r01878.vtc FAILED
#    top  TEST tests/r01878.vtc FAILED (0.327) exit=2

Possible Solution

Upping thread_pool_stack from 80k to 128k seems to fix problems.

ingvar@teie:~/src/varnish-7.5.0/bin/varnishtest$ sed -i 's/thread_pool_stack 80k/thread_pool_stack 128k/g;' tests/*.vtc

ingvar@teie:~/src/varnish-7.5.0/bin/varnishtest$ grep -l 'thread_pool_stack 128k' tests/*.vtc | while read i; do ./varnishtest -i "$i"; done
#    top  TEST tests/e00016.vtc passed (3.472)
#    top  TEST tests/e00029.vtc passed (3.482)
#    top  TEST tests/e00030.vtc passed (3.675)
#    top  TEST tests/e00034.vtc passed (3.680)
#    top  TEST tests/l00003.vtc passed (3.472)
#    top  TEST tests/r01737.vtc passed (6.476)
#    top  TEST tests/r01781.vtc passed (3.472)
#    top  TEST tests/r01878.vtc passed (3.471)
#    top  TEST tests/r02849.vtc passed (9.189)
#    top  TEST tests/v00042.vtc passed (3.674)
#    top  TEST tests/v00043.vtc passed (4.076)

Steps to Reproduce (for bugs)

  1. Install Debian on a 32bit powerpc machine. Plan 12-16 weeks to get hardware working, and then 4-5 full days to get a working Linux environment. Do not even think about Gentoo.
  2. Build varnish-7.5.0
  3. Run the test suite (make check)

Context

This is just for regression testing. I presume no one in their right mind would use varnish in producton on 32bit ppc. But hey, who knows? Update: Also visible on aarch64 and ppc64le

Varnish Cache version

varnish-7.5.0

Operating system

Debian 6.0 (trixie/sid) / ppc32

Source of binary packages used (if any)

No response

ingvarha commented 3 months ago

This hits arm32 and arm64 as well

ingvarha commented 3 months ago

and ppc64le

ingvarha commented 3 months ago

Also similar, on ppc64le, in tests/r04036.vtc:

Error: -sfile too small for this architecture, minimum size is 8 MB

Changing malloc setting to file,8M fixes that. I don't know if this could be a general or arch specific change. I made it arch specific in the fedora rpm.

dridi commented 3 months ago

I suspect hardening flags to play a role in at least increased stack usage.

$ rpm --eval %build_cflags
-O2 -flto=auto -ffat-lto-objects -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-U_FORTIFY_SOURCE,-D_FORTIFY_SOURCE=3 -Wp,-D_GLIBCXX_ASSERTIONS -specs=/usr/lib/rpm/redhat/redhat-hardened-cc1 -fstack-protector-strong -specs=/usr/lib/rpm/redhat/redhat-annobin-cc1  -m64   -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer
dridi commented 3 months ago

What happens if you try to rebuild 7.4.2 in the same build environments?

ingvarha commented 3 months ago

I did that for fedora-40. I can try again, in case the builders have changed.