varnish / varnish-modules

Collection of Varnish Cache modules (vmods) by Varnish Software
Other
185 stars 86 forks source link

Assert error in WS_Release() #53

Closed joshuaspence closed 7 years ago

joshuaspence commented 7 years ago

Moved from varnishcache/varnish-cache#2192.

We just had an outage in production caused by the following panic:

varnish> panic.show
200        
Panic at: Thu, 19 Jan 2017 06:59:23 GMT
"Assert error in WS_Release(), cache/cache_ws.c line 234:
  Condition(bytes <= ws->e - ws->f) not true.
thread = (cache-worker)
version = varnish-5.0.0 revision 99d036f
ident = Linux,4.4.0-47-generic,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
Backtrace:
  0x4351b6: varnishd() [0x4351b6]
  0x44e548: varnishd(WS_Release+0xa8) [0x44e548]
  0x7f8ac7df9808: libvmod_cookie.so(vmod_parse+0x118) [0x7f8ac7df9808]
  0x7f8ac4f8f196: vgc.so(VGC_function_determine_authentication_status+0xc6) [0x7f8ac4f8f196]
  0x7f8ac4f8fcb0: vgc.so(VGC_function_vcl_recv+0x1a0) [0x7f8ac4f8fcb0]
  0x44212b: varnishd() [0x44212b]
  0x44477a: varnishd(VCL_recv_method+0x5a) [0x44477a]
  0x438dee: varnishd(CNT_Request+0xebe) [0x438dee]
  0x454edf: varnishd(HTTP1_Session+0x4df) [0x454edf]
  0x4556b7: varnishd() [0x4556b7]
req = 0x7f8ab895e020 {
  vxid = 4718713, transport = HTTP/1 {
    state = HTTP1::Proc
  }
  step = R_STP_RECV,
  req_body = R_BODY_NONE,
  restarts = 0, esi_level = 0,
  sp = 0x7f8ac2c10a20 {
    fd = 219, vxid = 4718712,
    t_open = 1484809145.215435,
    t_idle = 1484809145.215435,
    transport = HTTP/1 {
      state = HTTP1::Proc
    }
client = 10.147.42.240 32572,
  },
  worker = 0x7f8acd552c90 {
    stack = {0x7f8acd553000 -> 0x7f8acd547000},
    ws = 0x7f8acd552e78 {
      id = \"wrk\",
      {s, f, r, e} = {0x7f8acd552430, 0x7f8acd552430, (nil), +2040},
    },
    VCL::method = inside RECV,
    VCL::return = 0x0,
    VCL::methods = {RECV},
  },
  ws = 0x7f8ab895e1f8 {
    OVERFLOWED id = \"Req\",
    {s, f, r, e} = {0x7f8ab895fff8, +57344, (nil), +57344},
  },
  http_conn = 0x7f8ab895e128 {
    fd = 219,
    doclose = NULL,
    ws = 0x7f8ab895e1f8,
    {rxbuf_b, rxbuf_e} = {0x7f8ab895fff8, 0x7f8ab8962430},
    {pipeline_b, pipeline_e} = {(nil), (nil)},
    content_length = -1,
    body_status = none,
    first_byte_timeout = 0.000000,
    between_bytes_timeout = 0.000000,
  },
  http[req] = 0x7f8ab895e290 {
    ws[Req] = 0x7f8ab895e1f8,
    hdrs {
      *REDACTED*
    },
  },
  vcl = {
    name = \"vcl_3a969e31-59cc-4fed-870a-21ed9c1f9255\"
    busy = 324
    discard = 0,
    state = auto,
    temp = warm,
    conf = {
      srcname = {
        \"/etc/varnish/default.vcl\",
        \"Builtin\",
        \"/etc/varnish/includes/directors.vcl\",
        \"/etc/varnish/includes/probes.vcl\",
        \"/etc/varnish/includes/backends.vcl\",
        *REDACTED*
        \"/etc/varnish/includes/acls.vcl\",
        \"/etc/varnish/includes/backendselection.vcl\",
        \"/etc/varnish/varnish.vcl\",
        \"/etc/varnish/includes/subroutines.vcl\",
        \"/etc/varnish/includes/subroutines/determine_authentication_status.vcl\",
        \"/etc/varnish/includes/subroutines/normalize_cookies.vcl\",
        \"/etc/varnish/includes/subroutines/normalize_host.vcl\",
        \"/etc/varnish/includes/subroutines/normalize_url.vcl\",
        \"/etc/varnish/includes/subroutines/parse_accept_language.vcl\",
        \"/etc/varnish/includes/subroutines/sanitize_cookies.vcl\",
        \"/etc/varnish/includes/subroutines/sanitize_query_parameters.vcl\",
      },
    },
  },
  flags = {
  },
},

"

Happy to provide further information if required.

aondio commented 7 years ago

Hi @joshuaspence, among the panic's lines:

 ws = 0x7f8ab895e1f8 {
    OVERFLOWED id = \"Req\",
    {s, f, r, e} = {0x7f8ab895fff8, +57344, (nil), +57344},
  },

This is an overflow happening on the client side. What is the value of the parameter "workspace_client"? Can you please try to increase it?

daghf commented 7 years ago

Hey @joshuaspence

Agree with @aondio that this is workspace exhaustion. However, we shouldn't /crash/ in that condition.

I've commited a fix for this - would you mind giving master/HEAD a spin?

joshuaspence commented 7 years ago

I'm likely using whatever the default value for the workspace_client parameter is. Is this parameter documented anywhere? I've heard about tuning this parameter but I've never been able to understand how/when to do so.

joshuaspence commented 7 years ago

@daghf, I can try running from master, but I'm not sure how I would actually verify the fix. I used the headers from the output of panic.show to reconstruct the original request, but I wasn't able to reproduce the crash.

joshuaspence commented 7 years ago
> sudo varnishadm param.show | grep workspace_client
workspace_client           64k [bytes] (default)
daghf commented 7 years ago

Hi

workspace_client is an allocation of scratch memory space made available for header processing and VMODs. Typically the default value of 64k works just fine, but for cases where there are big headers or memory-hungry VMODs (e.g. vmod-cookie) we sometimes run out.

Increasing it (like @aondio noted) is fine. Try setting it to 128k

joshuaspence commented 7 years ago

That doesn't explain why I wasn't able to reproduce the crash by replaying the request though? Is there a way to calculate the required value for workspace_client? I'm not really a fan of just increasing it until it stops crashing.

daghf commented 7 years ago

Hi @joshuaspence

I'm closing this issue as fixed.

I'm not entirely sure why the bug didn't trigger upon replaying the request. Was the request exactly identical to the one in the crash? No changes to VCL and/or headers?

joshuaspence commented 7 years ago

I'm seeing a very similar panic now:

Panic at: Mon, 30 Jan 2017 13:39:23 GMT
"Assert error in WS_Release(), cache/cache_ws.c line 236:
  Condition(ws->r != NULL) not true.
thread = (cache-worker)
version = varnish-5.0.0 revision 99d036f
ident = Linux,4.4.0-47-generic,x86_64,-junix,-smalloc,-smalloc,-hcritbit,epoll
Backtrace:
  0x4351b6: varnishd() [0x4351b6]
  0x44e588: varnishd() [0x44e588]
  0x7f598d5f9808: libvmod_cookie.so(vmod_parse+0x118) [0x7f598d5f9808]
  0x7f469a1f04b6: vgc.so(VGC_function_determine_authentication_status+0xc6) [0x7f469a1f04b6]
  0x7f469a1f0fd0: vgc.so(VGC_function_vcl_recv+0x1a0) [0x7f469a1f0fd0]
  0x44212b: varnishd() [0x44212b]
  0x44477a: varnishd(VCL_recv_method+0x5a) [0x44477a]
  0x438dee: varnishd(CNT_Request+0xebe) [0x438dee]
  0x454edf: varnishd(HTTP1_Session+0x4df) [0x454edf]
  0x4556b7: varnishd() [0x4556b7]
req = 0x7f4a50d74020 {
  vxid = 96039094, transport = HTTP/1 {
    state = HTTP1::Proc
  }
  step = R_STP_RECV,
  req_body = R_BODY_NONE,
  restarts = 0, esi_level = 0,
  sp = 0x7f47d23e9820 {
    fd = 120, vxid = 96039093,
    t_open = 1485783359.621496,
    t_idle = 1485783359.621496,
    transport = HTTP/1 {
      state = HTTP1::Proc
    }
client = 10.136.70.68 33420,
  },
  worker = 0x7f599518ec90 {
    stack = {0x7f599518f000 -> 0x7f5995183000},
    ws = 0x7f599518ee78 {
      id = \"wrk\",
      {s, f, r, e} = {0x7f599518e430, 0x7f599518e430, (nil), +2040},
    },
    VCL::method = inside RECV,
    VCL::return = 0x0,
    VCL::methods = {RECV},
  },
  ws = 0x7f4a50d741f8 {
    OVERFLOWED id = \"Req\",
    {s, f, r, e} = {0x7f4a50d75ff8, +57264, (nil), +57344},
  },
  http_conn = 0x7f4a50d74128 {
    fd = 120,
    doclose = NULL,
    ws = 0x7f4a50d741f8,
    {rxbuf_b, rxbuf_e} = {0x7f4a50d75ff8, 0x7f4a50d7993d},
    {pipeline_b, pipeline_e} = {(nil), (nil)},
    content_length = -1,
    body_status = none,
    first_byte_timeout = 0.000000,
    between_bytes_timeout = 0.000000,
  },
  http[req] = 0x7f4a50d74290 {
    ws[Req] = 0x7f4a50d741f8,
    hdrs {
      \"GET\",
      <<REDACTED>>
    },
  },
  vcl = {
    name = \"vcl_525f48f3-bccd-4d7b-b775-f0de6caddd97\"
    busy = 153
    discard = 0,
    state = auto,
    temp = warm,
    conf = {
      srcname = {
        \"/etc/varnish/default.vcl\",
        \"Builtin\",
        \"/etc/varnish/includes/directors.vcl\",
        \"/etc/varnish/includes/probes.vcl\",
        \"/etc/varnish/includes/backends.vcl\",
        \"/etc/varnish/includes/backends/gaf_admin.vcl\",
        \"/etc/varnish/includes/backends/gaf_appserver.vcl\",
        \"/etc/varnish/includes/backends/gaf_assets.vcl\",
        \"/etc/varnish/includes/acls.vcl\",
        \"/etc/varnish/includes/backendselection.vcl\",
        \"/etc/varnish/varnish.vcl\",
        \"/etc/varnish/includes/subroutines.vcl\",
        \"/etc/varnish/includes/subroutines/determine_authentication_status.vcl\",
        \"/etc/varnish/includes/subroutines/normalize_cookies.vcl\",
        \"/etc/varnish/includes/subroutines/normalize_host.vcl\",
        \"/etc/varnish/includes/subroutines/normalize_url.vcl\",
        \"/etc/varnish/includes/subroutines/parse_accept_language.vcl\",
        \"/etc/varnish/includes/subroutines/sanitize_cookies.vcl\",
        \"/etc/varnish/includes/subroutines/sanitize_query_parameters.vcl\",
      },
    },
  },
  flags = {
  },
},

"

Should I open a new issue for this?

gquintard commented 7 years ago

Is that with the code with the latest changes?

joshuaspence commented 7 years ago

Yes, I built from master a few days ago.

gquintard commented 7 years ago

Weird, the stack trace doesn't make sense then: last vmod_cookie function called is vmod_parse, and the only code path reaching WS_Release() goes through vmod_set().

joshuaspence commented 7 years ago

One thing I'm not sure of.... if I update the Varnish modules, do I need to restart Varnish or is a reload sufficient? It looks as if we reloaded Varnish when we updated the cookie vmod, and from what I can tell the updated libvmod_cookie.so file seems to be loaded by the Varnish process.

gquintard commented 7 years ago

@joshuaspence restart is the surest way :-)

joshuaspence commented 7 years ago

Sure, but that means loosing all of the objects stored in cache. We build the varnish-modules repo from master which means that we would be restarting Varnish once every three days (on average).

gquintard commented 7 years ago

What you may try then is:

import module from "/path/to/file.so";

hoping to twart the caching mechanism. May or not work, but worth a try.