Open nigoroll opened 3 months ago
@delthas apologies for having lost track of this issue. If you have time, could you please (re)confirm that it still happens with latest Varnish-Cache? In particular, I would like to exclude the possibility of this being related to https://github.com/varnishcache/varnish-cache/issues/4108, for which the fix went in on Monday.
@nigoroll I can confirm this still happens with Varnish 7.5.0. Varnish 7.5.0 crashed here this morning with:
Wrong turn at cache/cache_vrt_vcl.c:252:
Dynamic Backends can only be added to warm VCLs
Panic.show in varnishadm reported:
varnish> panic.show
200
Panic at: Fri, 19 Jul 2024 07:36:47 GMT
Wrong turn at cache/cache_vrt_vcl.c:252:
Dynamic Backends can only be added to warm VCLs
version = varnish-7.5.0 revision eef25264e5ca5f96a77129308edb83ccf84cb1b1, vrt api = 19.0
ident = Linux,5.15.111-flatcar,x86_64,-jnone,-smalloc,-sdefault,-hcritbit,epoll
now = 954234.585670 (mono), 1721374607.072795 (real)
Backtrace:
0x55f72eefecfe: varnishd(+0x5ccfe) [0x55f72eefecfe]
0x55f72ef7f2c5: varnishd(VAS_Fail+0x45) [0x55f72ef7f2c5]
0x55f72ef2301b: varnishd(+0x8101b) [0x55f72ef2301b]
0x55f72eed381d: varnishd(VRT_new_backend_clustered+0x45d) [0x55f72eed381d]
0x7f976a0cf394: ./vmod_cache/_vmod_dynamic.6626bfa332c102136cbd5ce44844ca052182968667357aa177db213793d9b324(+0x7394) [0x7f976a0cf394]
0x7f9777b2d134: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7f9777b2d134]
0x7f9777baca40: /lib/x86_64-linux-gnu/libc.so.6(__clone+0x40) [0x7f9777baca40]
argv = {
[0] = \"varnishd\",
[1] = \"-F\",
[2] = \"-f\",
[3] = \"/etc/varnish/default.vcl\",
[4] = \"-a\",
[5] = \"http=:80,HTTP\",
[6] = \"-a\",
[7] = \"proxy=:8443,PROXY\",
[8] = \"-p\",
[9] = \"feature=+http2\",
[10] = \"-s\",
[11] = \"malloc,27G\",
[12] = \"-p\",
[13] = \"feature=+http2\",
[14] = \"-p\",
[15] = \"default_keep=300\",
}
pthread.self = 0x7f9766e326c0
pthread.attr = {
guard = 4096,
stack_bottom = 0x7f9766633000,
stack_top = 0x7f9766e33000,
stack_size = 8388608,
}
thr.req = NULL
thr.busyobj = NULL
thr.worker = NULL
vmods = {
std = {0x7f97774d7230, Varnish 7.5.0 eef25264e5ca5f96a77129308edb83ccf84cb1b1, 0.0},
cookie = {0x7f97774d72a0, Varnish 7.5.0 eef25264e5ca5f96a77129308edb83ccf84cb1b1, 0.0},
dynamic = {0x7f97774d7380, Varnish 7.5.0 eef25264e5ca5f96a77129308edb83ccf84cb1b1, 19.0},
directors = {0x7f97774d73f0, Varnish 7.5.0 eef25264e5ca5f96a77129308edb83ccf84cb1b1, 0.0},
},
pools = {
pool = 0x7f9770200000 {
nidle = 96,
nthr = 100,
lqueue = 0
},
pool = 0x7f9770200640 {
nidle = 96,
nthr = 100,
lqueue = 0
},
},
The log of the varnish docker container contained more or less the same information:
Debug: Version: varnish-7.5.0 revision eef25264e5ca5f96a77129308edb83ccf84cb1b1,
Debug: Platform: Linux,5.15.111-flatcar,x86_64,-jnone,-smalloc,-sdefault,-hcritbit,
Debug: Child (22) Started,
Child launched OK,
Info: Child (22) said Child starts,
Error: Child (22) not responding to CLI, killed it.,
Error: Unexpected reply from ping: 400 CLI communication error (hdr),
Error: Child (22) died signal=9,
Debug: Child cleanup complete,
Debug: Child (4946) Started,
Child launched OK,
Error: Child (4946) not responding to CLI, killed it.,
Error: Child (4946) Pushing vcls failed:,
CLI communication error (hdr),
Debug: Stopping Child,
Info: Child (4946) said Child starts,
Error: Child (4946) died signal=6 (core dumped),
Error: Child (4946) Panic at: Fri, 19 Jul 2024 07:36:47 GMT,
Wrong turn at cache/cache_vrt_vcl.c:252:,
Dynamic Backends can only be added to warm VCLs,
version = varnish-7.5.0 revision eef25264e5ca5f96a77129308edb83ccf84cb1b1, vrt api = 19.0,
ident = Linux,5.15.111-flatcar,x86_64,-jnone,-smalloc,-sdefault,-hcritbit,epoll,
now = 954234.585670 (mono), 1721374607.072795 (real),
Backtrace:,
0x55f72eefecfe: varnishd(+0x5ccfe) [0x55f72eefecfe],
0x55f72ef7f2c5: varnishd(VAS_Fail+0x45) [0x55f72ef7f2c5],
0x55f72ef2301b: varnishd(+0x8101b) [0x55f72ef2301b],
0x55f72eed381d: varnishd(VRT_new_backend_clustered+0x45d) [0x55f72eed381d],
0x7f976a0cf394: ./vmod_cache/_vmod_dynamic.6626bfa332c102136cbd5ce44844ca052182968667357aa177db213793d9b324(+0x7394) [0x7f976a0cf394],
0x7f9777b2d134: /lib/x86_64-linux-gnu/libc.so.6(+0x89134) [0x7f9777b2d134],
0x7f9777baca40: /lib/x86_64-linux-gnu/libc.so.6(__clone+0x40) [0x7f9777baca40],
argv = {,
[0] = \"varnishd\",,
[1] = \"-F\",,
[2] = \"-f\",,
[3] = \"/etc/varnish/default.vcl\",,
[4] = \"-a\",,
[5] = \"http=:80,HTTP\",,
[6] = \"-a\",,
[7] = \"proxy=:8443,PROXY\",,
[8] = \"-p\",,
[9] = \"feature=+http2\",,
[10] = \"-s\",,
[11] = \"malloc,27G\",,
[12] = \"-p\",,
[13] = \"feature=+http2\",,
[14] = \"-p\",,
[15] = \"default_keep=300\",,
},
pthread.self = 0x7f9766e326c0,
pthread.attr = {,
guard = 4096,,
stack_bottom = 0x7f9766633000,,
stack_top = 0x7f9766e33000,,
stack_size = 8388608,,
},
thr.req = NULL,
thr.busyobj = NULL,
thr.worker = NULL,,
{,
std = {0x7f97774d7230, Varnish 7.5.0 eef25264e5ca5f96a77129308edb83ccf84cb1b1, 0.0},,
cookie = {0x7f97774d72a0, Varnish 7.5.0 eef25264e5ca5f96a77129308edb83ccf84cb1b1, 0.0},,
dynamic = {0x7f97774d7380, Varnish 7.5.0 eef25264e5ca5f96a77129308edb83ccf84cb1b1, 19.0},,
directors = {0x7f97774d73f0, Varnish 7.5.0 eef25264e5ca5f96a77129308edb83ccf84cb1b1, 0.0},,
},,,
{,
pool = 0x7f9770200000 {,
nidle = 96,,
nthr = 100,,
lqueue = 0,
},,
pool = 0x7f9770200640 {,
nidle = 96,,
nthr = 100,,
lqueue = 0,
},,
},,
,
,
Debug: Child cleanup complete,
Could anyone who can reproduce this issue please run varnishd with this little patch and report the Panic?
diff --git a/bin/varnishd/cache/cache_vrt_vcl.c b/bin/varnishd/cache/cache_vrt_vcl.c
index 82e6e10a5..a17db5726 100644
--- a/bin/varnishd/cache/cache_vrt_vcl.c
+++ b/bin/varnishd/cache/cache_vrt_vcl.c
@@ -249,7 +249,7 @@ VRT_AddDirector(VRT_CTX, const struct vdi_methods *m, void *priv,
return (vcldir_surplus(vdir));
if (!temp->is_warm && temp != VCL_TEMP_INIT)
- WRONG("Dynamic Backends can only be added to warm VCLs");
+ WRONG(temp->name);
return (vdir->dir);
}
I am trying hard to reproduce this with current code (https://github.com/varnishcache/varnish-cache/commit/7292f923dc731292226782a898a804f880567707 and vmod_dynamic e1b93074696ae811fe46e75e221d478bf024f530), no success.
Test vcl which I am using
vcl 4.1;
import dynamic;
import directors;
backend proforma none;
sub vcl_init {
new r1 = dynamic.resolver();
r1.set_resolution_type(RECURSING); # tried with and without
new d = dynamic.director(
resolver = r1.use(),
ttl_from = dns
);
new rr = directors.round_robin();
rr.add_backend(d.backend("www.uplex.de"));
}
sub vcl_recv {
return (synth(200));
}
sub vcl_synth {
set resp.body = "" + rr.backend().resolve() + " " +
d.backend("uplex.de").resolve() + """
""";
return (deliver);
}
start:
varnishd -a 127.0.0.1:8080 -f /tmp/t.vcl
vcl load loop:
while : ; do n=v$RANDOM ; varnishadm vcl.load $n /tmp/t.vcl && varnishadm vcl.use $n ; varnishadm vcl.list | awk '/^av/ {print "vcl.discard " $5}'|varnishadm ; varnishadm panic.show ; done
curl loop (in a second terminal):
while : ; do curl localhost:8080 ; done
I just don't get the panic, also not with this addition to deliberately delay resolution:
diff --git a/src/dyn_resolver_getdns.c b/src/dyn_resolver_getdns.c
index c674f1b..f97b761 100644
--- a/src/dyn_resolver_getdns.c
+++ b/src/dyn_resolver_getdns.c
@@ -32,6 +32,7 @@
//#include <sys/types.h>
#include <sys/socket.h>
#include <netdb.h>
+#include <unistd.h>
#ifdef __FreeBSD__
#include <arpa/inet.h>
@@ -203,6 +204,7 @@ getdns_lookup(struct VPFX(dynamic_resolver) *r,
ret = getdns_address_sync(c->context, node, NULL, &state->response);
errchk(ret);
+ usleep(1000 * 1000);
return (getdns_common_lookup_check(state));
}
So, can anyone please help me with a reproducer?
@nigoroll I can confirm this still happens with Varnish 7.5.0. Varnish 7.5.0 crashed here this morning with:
FTR: The container build is https://github.com/varnish/docker-varnish/blob/master/fresh/debian/Dockerfile#L9C26-L9C66 which uses https://github.com/nigoroll/libvmod-dynamic/commit/5dc09f52cd8eeed77d879b0313bd8ad9a749477f, which is a minor change on top of https://github.com/nigoroll/libvmod-dynamic/commit/47bfcf54748ca47ad3bb35e20693a6d96eb5e77f
So if this was root-caused in dynamic, it should still happen.
Oops, I did not mean to close this issue before confirmation. (too much github magic, closing issues across projects is, hm, interesting)
I'm getting the same assert again, but in a slightly different way (even with the fix).
The stack is still:
I'm stressing Varnish + libvmod-dynamic with a lot of VCL reloads. Everything usually works fine for a while (sometimes hours), until Varnish starts to loop on this error, every ~15s to ~1 minute. So, no issues for hours and then once I get this panic, it repeats every ~15s to ~1min: child crashes, child restarts, child crashes, ...
Here's an excerpt of the full logs:
It continued crashing in a loop indefinitely for multiple days. I managed to attach to the process and the VCL was indeed COOL at the time of the panic.
Originally posted by @delthas in https://github.com/nigoroll/libvmod-dynamic/issues/108#issuecomment-2133195974