Open jocel1 opened 1 year ago
No idea what's going on here but the file storage seems to prove highly unreliable these days on aarch64 servers.
Could you try replacing your file storage with a 100GB malloc storage instead? Why did you add another 12GB malloc storage?
And how is /tmp
mounted on this system?
Unfortunately I "only" have 16GB of RAM on this machine. The idea was to split data in two:
/tmp in not in tmpfs but just in the standard disk partition
Well I'm now running without the file storage, using -s malloc,12288m -s static=malloc,1024m
However it seems I still have random panic issue:
déc. 07 20:13:03 varnishd[2789]: Child launched OK
déc. 07 20:13:03 varnishd[2790]: Child (2802) said Child starts
déc. 07 20:13:03 varnishd[2790]: Child (2802) said <jemalloc>: Error in munmap(): Invalid argument
déc. 07 20:13:03 systemd[1]: Started Varnish Cache, a high-performance HTTP accelerator.
déc. 07 20:52:10 varnishd[2790]: Child (2802) died signal=6
déc. 07 20:52:10 varnishd[2790]: Child (2802) Panic at: Wed, 07 Dec 2022 19:52:10 GMT
Wrong turn at cache/cache_main.c:323:
Signal 11 (Segmentation fault) received at 0xffff92ffe000 si_code 2...
déc. 07 20:52:10 varnishd[2790]: Child cleanup complete
déc. 07 20:52:10 varnishd[2790]: Child (3704) Started
déc. 07 20:52:10 varnishd[2790]: Child (3704) said Child starts
déc. 07 20:52:10 varnishd[2790]: Child (3704) said <jemalloc>: Error in munmap(): Invalid argument
Extra backtrace info:
Child (2802) Panic at: Wed, 07 Dec 2022 19:52:10 GMT
Wrong turn at cache/cache_main.c:323:
Signal 11 (Segmentation fault) received at 0xffff92ffe000 si_code 2
version = varnish-7.2.1 revision 399fa7ed46d68f4c4f42a8fd2748339750d99a8b, vrt api = 16.0
ident = Linux,5.15.75-48.135.amzn2.aarch64,aarch64,-junix,-smalloc,-smalloc,-sdefault,-hcritbit,epoll
now = 2399.653874 (mono), 1670442730.667755 (real)
Backtrace:
0x449230: /usr/sbin/varnishd() [0x449230]
0x4c41dc: /usr/sbin/varnishd(VAS_Fail+0x58) [0x4c41dc]
0x443d48: /usr/sbin/varnishd() [0x443d48]
0xffffa67df820: linux-vdso.so.1(__kernel_rt_sigreturn+0) [0xffffa67df820]
0xffffa6472d88: /lib64/libc.so.6(+0x85d88) [0xffffa6472d88]
0x47fd3c: /usr/sbin/varnishd() [0x47fd3c]
0x480494: /usr/sbin/varnishd() [0x480494]
0x437150: /usr/sbin/varnishd(VFP_Suck+0xa4) [0x437150]
0x432ca4: /usr/sbin/varnishd() [0x432ca4]
0x435b60: /usr/sbin/varnishd() [0x435b60]
argv = {
[0] = \"/usr/sbin/varnishd\",
[1] = \"-a\",
[2] = \":80\",
[3] = \"-p\",
[4] = \"workspace_session=1024\",
[5] = \"-p\",
[6] = \"workspace_thread=4096\",
[7] = \"-p\",
[8] = \"workspace_client=131072\",
[9] = \"-p\",
[10] = \"workspace_backend=131072\",
[11] = \"-p\",
[12] = \"listen_depth=16383\",
[13] = \"-p\",
[14] = \"http_resp_hdr_len=65536\",
[15] = \"-p\",
[16] = \"http_resp_size=98304\",
[17] = \"-p\",
[18] = \"workspace_backend=131072\",
[19] = \"-p\",
[20] = \"thread_pool_min=200\",
[21] = \"-p\",
[22] = \"thread_pool_max=5000\",
[23] = \"-p\",
[24] = \"feature=+http2\",
[25] = \"-P\",
[26] = \"/run/varnish/varnishd.pid\",
[27] = \"-T\",
[28] = \"localhost:6082\",
[29] = \"-f\",
[30] = \"/etc/varnish/default.vcl\",
[31] = \"-s\",
[32] = \"malloc,12288m\",
[33] = \"-s\",
[34] = \"static=malloc,1024m\",
}
pthread.self = 0xffff974671b0
pthread.name = (cache-worker)
pthread.attr = {
guard = 4096,
stack_bottom = 0xffff97448000,
stack_top = 0xffff97468000,
stack_size = 131072,
}
thr.req = NULL
thr.busyobj = 0xffff16fb0020 {
end = 0xffff16fd0000,
retries = 0,
sess = 0xffff33ffc420 {
fd = 88, vxid = 2689893,
t_open = 1670442707.678030,
t_idle = 1670442730.614282,
ws = 0xffff33ffc460 {
id = \"ses\",
{s, f, r, e} = {0xffff33ffc4b8, +96, (nil), +832},
},
transport = HTTP/1 {
state = HTTP1::Proc
}
client = x 50916 :80,
local.endpoint = :80,
local.socket = a0,
local.ip = x:80,
remote.ip = x:50916,
server.ip = x:80,
client.ip = x:50916,
},
worker = 0xffff974666c8 {
ws = 0xffff97466750 {
id = \"wrk\",
{s, f, r, e} = {0xffff974655d0, +0, (nil), +4088},
},
VCL::method = BACKEND_RESPONSE,
VCL::methods = {BACKEND_FETCH, BACKEND_RESPONSE},
},
vfc = 0xffff16fb4f38 {
failed = 0,
req = 0xffff16fb0648,
resp = 0xffff16fb0ac0,
wrk = 0xffff974666c8,
oc = 0xffff6b24c340,
filters = {
V1F_STRAIGHT = 0xffff16fcd4b0 {
priv1 = 0xffff16fb5448,
priv2 = 117346,
closed = 0
},
},
obj_flags = 0x0,
},
filter_list = \"\",
ws = 0xffff16fb0060 {
id = \"bo\",
{s, f, r, e} = {0xffff16fb4f80, +99984, (nil), +110712},
},
ws_bo = 0xffff16fb5430,
http[bereq] = 0xffff16fb0648 {
ws = 0xffff16fb0060 {
[Already dumped, see above]
},
hdrs {
\"GET\",
\"/2436994-large_default/vintage-multicolor-floral-murano-glass-sommerso-vase-italy-1970s.jpg\",
\"HTTP/1.1\",
\"X-Forwarded-Proto: https\",
\"X-Forwarded-Port: 443\",
\"X-Amzn-Trace-Id: Root=1-6390eeea-3f4ea97408e44db20ed02120\",
\"x-default-isocode: US\",
\"x-default-language: en\",
\"cf-ray: 775fccd8297cacf8-ATL\",
\"cf-visitor: {\"scheme\":\"https\"}\",
\"accept: image/webp,image/png,image/svg+xml,image/*;q=0.8,video/*;q=0.8,*/*;q=0.5\",
\"user-agent: Mozilla/5.0 (iPhone; CPU iPhone OS 15_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) GSA/238.1.487893381 Mobile/15E148 Safari/604.1\",
\"accept-language: en-US,en;q=0.9\",
\"referer: https://www.design-market.us/248098-vintage-multicolor-floral-murano-glass-sommerso-vase-italy-1970s.html?redirected=true\",
\"cf-connecting-ip: x\",
\"cf-pseudo-ipv4: x\",
\"cf-ipcountry: US\",
\"cdn-loop: cloudflare\",
\"X-Forwarded-For: x, x, x\",
\"Via: 1.1 ip-x.eu-west-1.compute.internal (Varnish/7.2)\",
\"host: www.design-market.us\",
\"X-ETag: noxetag\",
\"Accept-Encoding: gzip\",
\"hash_url: /2436994-large_default/vintage-multicolor-floral-murano-glass-sommerso-vase-italy-1970s.jpg\",
\"X-Varnish: 1249115\",
},
},
http[beresp] = 0xffff16fb0ac0 {
ws = 0xffff16fb0060 {
[Already dumped, see above]
},
hdrs {
\"HTTP/1.1\",
\"200\",
\"OK\",
\"Server: nginx/1.22.0\",
\"Date: Wed, 07 Dec 2022 19:52:10 GMT\",
\"Content-Type: image/jpeg\",
\"Content-Length: 117346\",
\"Connection: keep-alive\",
\"Strict-Transport-Security: max-age=15552001; includeSubDomains;\",
\"Last-Modified: Wed, 12 Oct 2022 18:45:39 GMT\",
\"Accept-Ranges: bytes\",
\"Expires: Fri, 06 Jan 2023 19:52:10 GMT\",
\"X-Content-Type-Options: nosniff\",
\"X-XSS-Protection: 1; mode=block\",
\"X-Frame-Options: sameorigin\",
\"x-storage: static\",
\"cache-control: public, max-age=2592000\",
\"static-url: /2436994-large_default/vintage-multicolor-floral-murano-glass-sommerso-vase-italy-1970s.jpg\",
\"X-UA-Device: \",
\"X-Device-Type: \",
\"X-Country-Code: \",
\"X-ETag: noxetag\",
\"Accept-Language: en-US,en;q=0.9\",
},
},
objcore[fetch] = 0xffff6b24c340 {
refcnt = 3,
flags = {},
exp_flags = {refd},
boc = 0xffff4058cf40 {
refcnt = 3,
state = stream,
vary = (nil),
stevedore_priv = (nil),
},
exp = {1670442730.667573, 2592000.000000, 120.000000, 0.000000},
objhead = 0xffff1205b060,
stevedore = 0xffffa5bf0160 (malloc static) {
Simple = 0xffff15d61f00,
Obj = 0xffff5d156918 {priv=0xffff5d156910, ptr=0xffff15d61f00, len=744, space=744},
LEN = 0x0...0,
VXID = 0x00130f5b,
FLAGS = 0x08,
GZIPBITS = 0x0...0,
LASTMODIFIED = 0x41d8d1c2d4c00000,
VARY = {len=0, ptr=(nil)},
HEADERS = {len=624, ptr=0xffff15d61f78},
Body = 0xffff653920e8 {priv=0xffff653920e0, ptr=0xffff92ff0000, len=0, space=58673},
},
},
http_conn = 0xffff16fb5448 {
fd = 120 (@0xffff46837944),
doclose = null(Not Closing)
ws = 0xffff16fb0060 {
[Already dumped, see above]
},
{rxbuf_b, rxbuf_e} = {0xffff16fb54b0, 0xffff16fb5807},
{pipeline_b, pipeline_e} = {0xffff16fb5807, 0xffff16fcd4b0},
content_length = 117346,
body_status = length,
first_byte_timeout = 300.000000,
between_bytes_timeout = 300.000000,
},
flags = {do_stream},
director_req = 0xffffa5a40af0 {
cli_name = boot.front,
admin_health = probe, changed = 1670440383.896565,
type = round-robin {
},
},
director_resp = 0xffffa5a40ac0 {
cli_name = boot.nlb_ip1,
admin_health = healthy, changed = 1670440383.895000,
type = backend {
conn_pool = 0xffffa5af0600 {
ident = 0xfd480fbdd018b14c3223a30af2cadc2e300897dfdf1263c3d76a6b7fd3d2069c,
vrt_endpoint = 0xffffa5a62ea0 {
ipv4 = 127.0.0.1, port = 8080,
},
},
hosthdr = localhost,
n_conn = 21,
},
},
vcl[vcl] = 0xffffa5af0380 {
name = \"boot\",
busy = 103,
discard = 0,
state = auto,
temp = warm,
conf = 0xffff95b7e650 {
syntax = \"40\",
srcname = {
[0] = \"/etc/varnish/default.vcl\",
[1] = \"/etc/varnish/backend.vcl\",
[2] = \"/etc/varnish/devicedetect.vcl\",
[3] = \"<builtin>\",
},
instances = {
\"front\" = 0xffffa5a72e00,
},
},
},
vpi = 0xffff974655c0 {
handling (VCL::return) = 0x0 (none),
ref = 161,
vpi_ref = 0xffff95b7dc90 {
source = 0 (\"/etc/varnish/default.vcl\"),
offset = 21167,
line = 525,
pos = 3,
src = \"set beresp.http.Accept-Language = bereq.[...]\"
},
},
},
thr.worker = 0xffff974666c8 {
[Already dumped, see above]
},
vmods = {
std = {0xffffa5be01d0, Varnish 7.2.1 399fa7ed46d68f4c4f42a8fd2748339750d99a8b, 0.0},
directors = {0xffffa5be0240, Varnish 7.2.1 399fa7ed46d68f4c4f42a8fd2748339750d99a8b, 0.0},
header = {0xffffa5be02b0, Varnish 7.2.1 399fa7ed46d68f4c4f42a8fd2748339750d99a8b, 16.0},
xkey = {0xffffa5be0320, Varnish 7.2.1 399fa7ed46d68f4c4f42a8fd2748339750d99a8b, 0.0},
},
pools = {
pool = 0xffff9de40100 {
nidle = 173,
nthr = 200,
lqueue = 0
},
pool = 0xffff9de40740 {
nidle = 176,
nthr = 200,
lqueue = 0
},
},
FYI I've just upgraded the jemalloc lib (from 3.6.0 to 5.3.0), I'm monitoring if it has any impact on this issue (it should at least fix the
I would also like to be interested to learn how it fares without jemalloc at all, see #3881.
Thx, I'll also try without jemalloc at all!
FYI Varnish has been running with the original -s malloc,12288m -s static=file,/tmp/varnish_storage.bin,100G and jemalloc 5.3.0 without any issue so far!
Could you please try another build with jemalloc 3.6?
./configure --with-jemalloc --with-unwind --enable-debugging-symbols
And then run varnishd with this jemalloc configuration in your environment:
export MALLOC_CONF="abort:true,redzone:true,junk:true"
Hi @Dridi
Just to be sure, release 7.2.1 should be fine for this test, no need to use develop?
Hi @Dridi
FYI it's not possible to start varnish with jemalloc 3.6 & the debug config on arm because of the munmap issue:
déc. 25 21:42:30 ip-172-29-97-79.eu-west-1.compute.internal varnishd[22282]: Version: varnish-7.2.1 revision NOGIT
déc. 25 21:42:30 ip-172-29-97-79.eu-west-1.compute.internal varnishd[22282]: Platform: Linux,5.10.157-139.675.amzn2.aarch64,aarch64,-junix,-smalloc,-sfile,-sdefault,-hcritbit
déc. 25 21:42:30 ip-172-29-97-79.eu-west-1.compute.internal varnishd[22282]: Child (22295) Started
déc. 25 21:42:30 ip-172-29-97-79.eu-west-1.compute.internal varnishd[22282]: Child (-1) said Child starts
déc. 25 21:42:30 ip-172-29-97-79.eu-west-1.compute.internal varnishd[22282]: Child (-1) said
With jemalloc 5.3.0 it starts properly (but there's other issues after a while with instability / high CPU usage)
Hi @Dridi
I'm now evaluating the 7.2.1 version compiled without jemalloc (+ varnish-modules) on a arm AWS graviton2 processor (compiled with gcc10 & mcpu=neoverse-n1 flags). I'll let you know if it has any instability.
Hi @Dridi
FYI 7.2.1 without jemalloc using patch from #3881 seems to work pretty well on AWS arm. I see a few spike in CPU usage during aggressive google crawls I didn't have before (x86_64 instances), but response time are good & stable so it's not really an issue (here it's an x2gd.large with 2 cores and 32GB of RAM)
I confirm after more than 1 week of use everything is still really stable on ARM without jemalloc.
Hi,
I'm using varnish with a malloc allocator and a file allocator for static files. Settings are the following:
ExecStart=/usr/sbin/varnishd -a :80 -p workspace_session=1024 -p workspace_thread=4096 -p workspace_client=131072 -p workspace_backend=131072 -p listen_depth=16383 -p http_resp_hdr_len=65536 -p http_resp_size=98304 -p workspace_backend=131072 -p thread_pool_min=200 -p thread_pool_max=5000 -p feature=+http2 -P %t/%N/varnishd.pid -T localhost:6082 -f /etc/varnish/default.vcl -s malloc,12288m -s static=file,/tmp/varnish_storage.bin,100G
Expected Behavior
No crash
Current Behavior
I figured out varnish was complaining in the systemctl log. Active: active (running) since lun. 2022-12-05 20:39:57 CET; 1 day 18h ago
déc. 06 21:25:33 varnishd[2813]: Child (8130) Started déc. 06 21:25:33 varnishd[2813]: Child (8130) said Child starts déc. 06 21:25:33 varnishd[2813]: Child (8130) said: Error in munmap(): Invalid argument
déc. 06 21:25:33 varnishd[2813]: Child (8130) said SMF.static mmap'ed 107374182400 bytes of 107374182400
déc. 07 07:59:26 varnishd[2813]: Child (8130) died signal=6
déc. 07 07:59:26 varnishd[2813]: Child (8130) Panic at: Wed, 07 Dec 2022 06:59:26 GMT
Assert error in HSH_Lookup(), cache/cache_hash.c line 426:
Condition((oc)->magic == 0x4d301302) not true....
So the first warning happened a day after starting the server. Then a few hours later, we can see the assert.
This is happening on a ARM architecture (t4g AWS servers) on Amazon Linux 2 + kernel 5.15.75-48.135 I'm using 4 vmod: directors, header, std & xkey I'm fetching the binary from https://packagecloud.io/install/repositories/varnishcache/varnish72/script.rpm.sh & https://github.com/varnish/varnish-modules/releases/download/0.21.0/varnish-modules-0.21.0.tar.gz
My vcl_hash is not that complicated:
Any idea what could be wrong?
Best regards, Jocelyn Fournier