varnishcache / varnish-cache

Varnish Cache source code repository
https://www.varnish-cache.org
Other
3.56k stars 366 forks source link

[vcl] panic on accessing storage.s0.used_space for file storage #4036

Closed gquintard closed 6 months ago

gquintard commented 6 months ago

Expected Behavior

VCL shouldn't panic

Current Behavior

***  v1   debug|Error: Child (26) Panic at: Tue, 02 Jan 2024 18:46:05 GMT
***  v1   debug|Wrong turn at cache/cache_main.c:284:
***  v1   debug|Signal 11 (Segmentation fault) received at (nil) si_code 1
***  v1   debug|version = varnish-6.3.2 revision 6870fd661a2b42c2e8adad838b5d92a71f27dccd, vrt api = 10.0
***  v1   debug|ident = Linux,6.6.8-arch1-1,x86_64,-junix,-sfile,-sdefault,-hcritbit,epoll
***  v1   debug|now = 8178.009065 (mono), 1704221163.944813 (real)
***  v1   debug|Backtrace:
***  v1   debug|  0x558d49e4fccf: varnishd(+0x4fccf) [0x558d49e4fccf]
***  v1   debug|  0x558d49ebd680: varnishd(VAS_Fail+0x40) [0x558d49ebd680]
***  v1   debug|  0x558d49e4a963: varnishd(+0x4a963) [0x558d49e4a963]
***  v1   debug|  0x7f6e6b9590e0: /lib/x86_64-linux-gnu/libpthread.so.0(+0x110e0) [0x7f6e6b9590e0]
***  v1   debug|thread = (cache-worker)
***  v1   debug|pthread.attr = {
***  v1   debug|  guard = 4096,
***  v1   debug|  stack_bottom = 0x7f6e6ca89000,
***  v1   debug|  stack_top = 0x7f6e6ca97000,
***  v1   debug|  stack_size = 57344,
***  v1   debug|}
***  v1   debug|thr.req = 0x7f6e5e846020 {
***  v1   debug|  vxid = 1001, transport = HTTP/1 {
***  v1   debug|    state = HTTP1::Proc
***  v1   debug|  }
***  v1   debug|  step = R_STP_RECV,
***  v1   debug|  req_body = R_BODY_NONE,
***  v1   debug|  restarts = 0, esi_level = 0,
***  v1   debug|  sp = 0x7f6e5e40e820 {
***  v1   debug|    fd = 20, vxid = 1000,
***  v1   debug|    t_open = 1704221163.944552,
***  v1   debug|    t_idle = 1704221163.944552,
***  v1   debug|    ws = 0x7f6e5e40e860 {
***  v1   debug|      id = \"ses\",
***  v1   debug|      {s, f, r, e} = {0x7f6e5e40e8a0, +96, (nil), +344},
***  v1   debug|    },
***  v1   debug|    transport = HTTP/1 {
***  v1   debug|      state = HTTP1::Proc
***  v1   debug|    }
***  v1   debug|    client = 127.0.0.1 39110 127.0.0.1:40689,
***  v1   debug|  },
***  v1   debug|  worker = 0x7f6e6ca95de0 {
***  v1   debug|    ws = 0x7f6e6ca95e88 {
***  v1   debug|      id = \"wrk\",
***  v1   debug|      {s, f, r, e} = {0x7f6e6ca953d0, +0, (nil), +2040},
***  v1   debug|    },
***  v1   debug|    VCL::method = inside RECV,
***  v1   debug|    VCL::return = 0x0,
***  v1   debug|    VCL::methods = {RECV},
***  v1   debug|  },
***  v1   debug|  ws = 0x7f6e5e846170 {
***  v1   debug|    id = \"req\",
***  v1   debug|    {s, f, r, e} = {0x7f6e5e8480b0, +72, (nil), +57160},
***  v1   debug|  },
***  v1   debug|  http_conn = 0x7f6e5e848050 {
***  v1   debug|    fd = 20 (@0x7f6e5e40e844),
***  v1   debug|    doclose = NULL,
***  v1   debug|    ws = 0x7f6e5e846170 {
***  v1   debug|      [Already dumped, see above]
***  v1   debug|    },
***  v1   debug|    {rxbuf_b, rxbuf_e} = {0x7f6e5e8480b0, 0x7f6e5e8480d3},
***  v1   debug|    {pipeline_b, pipeline_e} = {(nil), (nil)},
***  v1   debug|    content_length = -1,
***  v1   debug|    body_status = none,
***  v1   debug|    first_byte_timeout = 0.000000,
***  v1   debug|    between_bytes_timeout = 0.000000,
***  v1   debug|  },
***  v1   debug|  http[req] = 0x7f6e5e846218 {
***  v1   debug|    ws = 0x7f6e5e846170 {
***  v1   debug|      [Already dumped, see above]
***  v1   debug|    },
***  v1   debug|    hdrs {
***  v1   debug|      \"GET\",
***  v1   debug|      \"/\",
***  v1   debug|      \"HTTP/1.1\",
***  v1   debug|      \"Host: 127.0.0.1\",
***  v1   debug|      \"X-Forwarded-For: 127.0.0.1\",
***  v1   debug|    },
***  v1   debug|  },
***  v1   debug|  vdc = 0x7f6e5e848028 {
***  v1   debug|    nxt = (nil),
***  v1   debug|    retval = 0,
***  v1   debug|  },
***  v1   debug|  vcl = {
***  v1   debug|    name = \"vcl1\",
***  v1   debug|    busy = 1,
***  v1   debug|    discard = 0,
***  v1   debug|    state = auto,
***  v1   debug|    temp = warm,
***  v1   debug|    conf = {
***  v1   debug|      syntax = \"41\",
***  v1   debug|      srcname = {
***  v1   debug|        \"<vcl.inline>\",
***  v1   debug|        \"Builtin\",
***  v1   debug|      },
***  v1   debug|    },
***  v1   debug|  },
***  v1   debug|  flags = {
***  v1   debug|  },
***  v1   debug|  privs = 0x7f6e5e846200 {
***  v1   debug|  },
***  v1   debug|  topreq = {
***  v1   debug|    req = 0x7f6e5e846020 {
***  v1   debug|      [Already dumped, see above]
***  v1   debug|    },
***  v1   debug|  },
***  v1   debug|},
***  v1   debug|thr.busyobj = (nil) {
***  v1   debug|},
***  v1   debug|vmods = {
***  v1   debug|},

Possible Solution

No response

Steps to Reproduce (for bugs)


server s1 {} -start

varnish v1 -arg "-s s0=file,${tmpdir},10M" -vcl+backend {
    sub vcl_recv {
        set req.http.space = storage.s0.free_space;
    }
} -start

client c1 {
    txreq
    rxresp
} -run

Context

found and reproduced by r#8899 on slack. Tested on docker (debian) image and on native arch packages

this affects 6.3 and later. I'm getting a configure error on older commits, so I haven't had the chance to bisect yet

Varnish Cache version

No response

Operating system

No response

Source of binary packages used (if any)

No response

gquintard commented 6 months ago

looks like this happened with 2962375b4ed80e2197a318f92f2fb9f6cfcd12f8

walid-git commented 6 months ago

I did have a look and found that this is due to a missing NULL check here: https://github.com/varnishcache/varnish-cache/blob/7aeca4d632827e2db91bf63e67e4c61c9319d038/bin/varnishd/storage/stevedore.c#L310

stv->var_#nm may or may not be implemented by the stevedore, for example using storage.s0.happy with the malloc stevedore would result in the same panic. Not sure about the fix, returning a default value here could be misleading. Should we make these mandatory on every stevedore ?

nigoroll commented 6 months ago

Should we make these mandatory on every stevedore ?

👍🏽 the implementation is not hard and we should just check for the callbacks to be present when initializing the stevedore.

nigoroll commented 6 months ago

ok @Dridi it also makes sense to restore the previous behavior, but happy = 0 is a bad default.

dridi commented 5 months ago

ok @Dridi it also makes sense to restore the previous behavior, but happy = 0 is a bad default.

I pushed a fix disappointed that @gquintard didn't submit one :p

I think it is interesting that despite an unhelpful default behavior it took so long (and a refactoring accident) to bubble up. I limited my intervention to restoring the previous behavior.

nigoroll commented 5 months ago

I pushed a fix disappointed that @gquintard didn't submit one :p

Thank you :)

I think it is interesting that despite an unhelpful default behavior it took so long (and a refactoring accident) to bubble up. I limited my intervention to restoring the previous behavior.

Yes. But now that we have the first storage router (at least the first I know of) I guess it's time to actually use the happy callback...