Open delthas opened 1 week ago
From my understanding this could be related to https://github.com/varnishcache/varnish-cache/pull/4115 ?
cc @nigoroll
This is an excellent issue report, thank you! You even made an effort to patch vmod_dynamic with a cut&paste sed
command...
(edit) The actual reproducer is a delay added for VCL_EVENT_WARM
.
So what I think is happening here:
vmod_dynamic creates domain directors (basically resolver threads) also during vcl_init{}
, the feature you are using which is vital to layering. These directors create backends whenever name resolution completes.
If the VCL temperature is still VCL_TEMP_INIT
when these backends get created, all is good and well, they get added to the director list and get sent a warm event when the vcl becomes warm by this code:
But by adding the delay, we prolong the vcl_send_event()
and chances get (very) high that backends get created when the VCL is already warm, but before the vcl_BackendEvent()
. Then they receive two warm events and booom.
This touches on #4142 , but from the other end: Here we would need a director list from "before the warm event" and only these would need to get the event sent...
@delthas does #4205 work for you?
Thanks for the patch! #4205 fixes the issue on my end, the test case passes. :+1:
Expected Behavior
No panic on the following VTC.
Important reproduction notes
This is against Varnish master + libvmod-dynamic master. I can only reproduce this when intentionally "slowing down" vmod_event with a sleep.
In my case in production, I originally had the panic when using libvmod-redis, and without adding its
import
in my VCL, the panic would not be reproduced. I suspected some kind of race condition that only happens when waiting in vmod_event somehow (because that would be the only side effect of loading libvmod-redis).So, in order to reproduce the panic in the VCL below, add at the start of
libvmod-dynamic/src/vmod_dynamic.c
vmod_event
:(Or run the following from
libvmod-dynamic
:sed -z -i -E 's|(vmod_event\(VRT_CTX, struct vmod_priv \*priv, enum vcl_event_e e\)\n\{\n)|\1\#include \<unistd.h\>\nusleep\(100000\)\;\n|g' src/vmod_dynamic.c
)This will simulate a slow down in vmod_event by sleeping for 100ms.
Current Behavior
The panic is quite random, so might require several tries:
The panic is:
But sometimes the VCL state is
running
instead ofscheduled
.Possible Solution
No response
Steps to Reproduce (for bugs)
No response
Context
This was particularly hard to track down to a MWE VTC :stuck_out_tongue:
Varnish Cache version
varnishd master; libvmod-dynamic master + usleep patch from ticket
Operating system
No response
Source of binary packages used (if any)
No response