Closed joshuaspence closed 7 years ago
Hi @joshuaspence, among the panic's lines:
ws = 0x7f8ab895e1f8 {
OVERFLOWED id = \"Req\",
{s, f, r, e} = {0x7f8ab895fff8, +57344, (nil), +57344},
},
This is an overflow happening on the client side. What is the value of the parameter "workspace_client"? Can you please try to increase it?
Hey @joshuaspence
Agree with @aondio that this is workspace exhaustion. However, we shouldn't /crash/ in that condition.
I've commited a fix for this - would you mind giving master/HEAD a spin?
I'm likely using whatever the default value for the workspace_client
parameter is. Is this parameter documented anywhere? I've heard about tuning this parameter but I've never been able to understand how/when to do so.
@daghf, I can try running from master
, but I'm not sure how I would actually verify the fix. I used the headers from the output of panic.show
to reconstruct the original request, but I wasn't able to reproduce the crash.
> sudo varnishadm param.show | grep workspace_client
workspace_client 64k [bytes] (default)
Hi
workspace_client
is an allocation of scratch memory space made available for header processing and VMODs. Typically the default value of 64k works just fine, but for cases where there are big headers or memory-hungry VMODs (e.g. vmod-cookie
) we sometimes run out.
Increasing it (like @aondio noted) is fine. Try setting it to 128k
That doesn't explain why I wasn't able to reproduce the crash by replaying the request though? Is there a way to calculate the required value for workspace_client
? I'm not really a fan of just increasing it until it stops crashing.
Hi @joshuaspence
I'm closing this issue as fixed.
I'm not entirely sure why the bug didn't trigger upon replaying the request. Was the request exactly identical to the one in the crash? No changes to VCL and/or headers?
I'm seeing a very similar panic now:
Panic at: Mon, 30 Jan 2017 13:39:23 GMT
"Assert error in WS_Release(), cache/cache_ws.c line 236:
Condition(ws->r != NULL) not true.
thread = (cache-worker)
version = varnish-5.0.0 revision 99d036f
ident = Linux,4.4.0-47-generic,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
Backtrace:
0x4351b6: varnishd() [0x4351b6]
0x44e588: varnishd() [0x44e588]
0x7f598d5f9808: libvmod_cookie.so(vmod_parse+0x118) [0x7f598d5f9808]
0x7f469a1f04b6: vgc.so(VGC_function_determine_authentication_status+0xc6) [0x7f469a1f04b6]
0x7f469a1f0fd0: vgc.so(VGC_function_vcl_recv+0x1a0) [0x7f469a1f0fd0]
0x44212b: varnishd() [0x44212b]
0x44477a: varnishd(VCL_recv_method+0x5a) [0x44477a]
0x438dee: varnishd(CNT_Request+0xebe) [0x438dee]
0x454edf: varnishd(HTTP1_Session+0x4df) [0x454edf]
0x4556b7: varnishd() [0x4556b7]
req = 0x7f4a50d74020 {
vxid = 96039094, transport = HTTP/1 {
state = HTTP1::Proc
}
step = R_STP_RECV,
req_body = R_BODY_NONE,
restarts = 0, esi_level = 0,
sp = 0x7f47d23e9820 {
fd = 120, vxid = 96039093,
t_open = 1485783359.621496,
t_idle = 1485783359.621496,
transport = HTTP/1 {
state = HTTP1::Proc
}
client = 10.136.70.68 33420,
},
worker = 0x7f599518ec90 {
stack = {0x7f599518f000 -> 0x7f5995183000},
ws = 0x7f599518ee78 {
id = \"wrk\",
{s, f, r, e} = {0x7f599518e430, 0x7f599518e430, (nil), +2040},
},
VCL::method = inside RECV,
VCL::return = 0x0,
VCL::methods = {RECV},
},
ws = 0x7f4a50d741f8 {
OVERFLOWED id = \"Req\",
{s, f, r, e} = {0x7f4a50d75ff8, +57264, (nil), +57344},
},
http_conn = 0x7f4a50d74128 {
fd = 120,
doclose = NULL,
ws = 0x7f4a50d741f8,
{rxbuf_b, rxbuf_e} = {0x7f4a50d75ff8, 0x7f4a50d7993d},
{pipeline_b, pipeline_e} = {(nil), (nil)},
content_length = -1,
body_status = none,
first_byte_timeout = 0.000000,
between_bytes_timeout = 0.000000,
},
http[req] = 0x7f4a50d74290 {
ws[Req] = 0x7f4a50d741f8,
hdrs {
\"GET\",
<<REDACTED>>
},
},
vcl = {
name = \"vcl_525f48f3-bccd-4d7b-b775-f0de6caddd97\"
busy = 153
discard = 0,
state = auto,
temp = warm,
conf = {
srcname = {
\"/etc/varnish/default.vcl\",
\"Builtin\",
\"/etc/varnish/includes/directors.vcl\",
\"/etc/varnish/includes/probes.vcl\",
\"/etc/varnish/includes/backends.vcl\",
\"/etc/varnish/includes/backends/gaf_admin.vcl\",
\"/etc/varnish/includes/backends/gaf_appserver.vcl\",
\"/etc/varnish/includes/backends/gaf_assets.vcl\",
\"/etc/varnish/includes/acls.vcl\",
\"/etc/varnish/includes/backendselection.vcl\",
\"/etc/varnish/varnish.vcl\",
\"/etc/varnish/includes/subroutines.vcl\",
\"/etc/varnish/includes/subroutines/determine_authentication_status.vcl\",
\"/etc/varnish/includes/subroutines/normalize_cookies.vcl\",
\"/etc/varnish/includes/subroutines/normalize_host.vcl\",
\"/etc/varnish/includes/subroutines/normalize_url.vcl\",
\"/etc/varnish/includes/subroutines/parse_accept_language.vcl\",
\"/etc/varnish/includes/subroutines/sanitize_cookies.vcl\",
\"/etc/varnish/includes/subroutines/sanitize_query_parameters.vcl\",
},
},
},
flags = {
},
},
"
Should I open a new issue for this?
Is that with the code with the latest changes?
Yes, I built from master a few days ago.
Weird, the stack trace doesn't make sense then: last vmod_cookie function called is vmod_parse, and the only code path reaching WS_Release() goes through vmod_set().
One thing I'm not sure of.... if I update the Varnish modules, do I need to restart Varnish or is a reload sufficient? It looks as if we reloaded Varnish when we updated the cookie vmod, and from what I can tell the updated libvmod_cookie.so
file seems to be loaded by the Varnish process.
@joshuaspence restart is the surest way :-)
Sure, but that means loosing all of the objects stored in cache. We build the varnish-modules
repo from master
which means that we would be restarting Varnish once every three days (on average).
What you may try then is:
import module from "/path/to/file.so";
hoping to twart the caching mechanism. May or not work, but worth a try.
Moved from varnishcache/varnish-cache#2192.
We just had an outage in production caused by the following panic:
Happy to provide further information if required.