varnishcache / varnish-cache

Varnish Cache source code repository
https://www.varnish-cache.org
Other
3.56k stars 365 forks source link

Assert error in VBP_Control(): Condition(vt->heap_idx == VBH_NOIDX) not true. #4108

Open delthas opened 1 month ago

delthas commented 1 month ago

I'm using Varnish 7.5.0 and the 7.5 branch of libvmod-dynamic.

I'm stressing a Varnish with a simple VCL with many dynamic backends with probes, reloading and restarting many times over multiples hours and gathering any issues. I've had one repeated panic:

Assert error in VBP_Control(), cache/cache_backend_probe.c line 661:
  Condition(vt->heap_idx == VBH_NOIDX) not true.
Backtrace:
  0x55fb3cd793c7: /usr/sbin/varnishd(+0x5d3c7) [0x55fb3cd793c7]        // pan_ic
  0x55fb3cdf9dc5: /usr/sbin/varnishd(VAS_Fail+0x45) [0x55fb3cdf9dc5]   // VAS_Fail
  0x55fb3cd4f703: /usr/sbin/varnishd(+0x33703) [0x55fb3cd4f703]        // VBP_Control
  0x55fb3cd88dbb: /usr/sbin/varnishd(+0x6cdbb) [0x55fb3cd88dbb]        // vcl_BackendEvent
  0x55fb3cd8a335: /usr/sbin/varnishd(+0x6e335) [0x55fb3cd8a335]        // vcl_set_state
  0x55fb3cd8b369: /usr/sbin/varnishd(+0x6f369) [0x55fb3cd8b369]        // vcl_cli_load
  0x55fb3cdfc405: /usr/sbin/varnishd(+0xe0405) [0x55fb3cdfc405]        // cls_exec
  0x55fb3cdfc9f3: /usr/sbin/varnishd(VCLS_Poll+0x333) [0x55fb3cdfc9f3]
  0x55fb3cd55a94: /usr/sbin/varnishd(CLI_Run+0x64) [0x55fb3cd55a94]
  0x55fb3cd743ba: /usr/sbin/varnishd(child_main+0x22a) [0x55fb3cd743ba]
  0x55fb3cdbe7fd: /usr/sbin/varnishd(+0xa27fd) [0x55fb3cdbe7fd]
  0x55fb3cdbf38c: /usr/sbin/varnishd(+0xa338c) [0x55fb3cdbf38c]
  0x55fb3cdbf56e: /usr/sbin/varnishd(+0xa356e) [0x55fb3cdbf56e]
  0x55fb3cdfe4c6: /usr/sbin/varnishd(VEV_Once+0x1e6) [0x55fb3cdfe4c6]
  0x55fb3cdfe6c8: /usr/sbin/varnishd(VEV_Loop+0x28) [0x55fb3cdfe6c8]
  0x55fb3cd48071: /usr/sbin/varnishd(main+0x2001) [0x55fb3cd48071]
  0x7fdb47bd1b45: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5) [0x7fdb47bd1b45]
  0x55fb3cd48711: /usr/sbin/varnishd(+0x2c711) [0x55fb3cd48711]

This looks very close to https://github.com/varnishcache/varnish-cache/pull/3957 (itself fixing https://github.com/nigoroll/libvmod-dynamic/issues/100), but even with a version including the fix, I'm getting this assert. In that PR the crash is Assert error in VBP_Remove(): Condition(vt->heap_idx == VBH_NOIDX) not true., here it's Assert error in VBP_Control(): Condition(vt->heap_idx == VBH_NOIDX) not true., so in a slightly different place.

delthas commented 1 month ago

cc @nigoroll

nigoroll commented 1 month ago

Yes, I think it basically is the same race as the one closed in #3957

delthas commented 1 month ago

Does that mean the fix would just be the following?

--- a/bin/varnishd/cache/cache_backend_probe.c
+++ b/bin/varnishd/cache/cache_backend_probe.c
@@ -658,6 +658,9 @@ VBP_Control(const struct backend *be, int enable)

        Lck_Lock(&vbp_mtx);
        if (enable) {
+               if (vt->heap_idx != VBH_NOIDX) {
+                       VBH_delete(vbp_heap, vt->heap_idx);
+               }
                assert(vt->heap_idx == VBH_NOIDX);
                vt->due = VTIM_real();
                vbp_heap_insert(vt);
nigoroll commented 1 month ago

Does that mean the fix would just be the following?

FTR: @delthas reported on IRC that the patch avoids the panic, which is helpful information, thank you.

nigoroll commented 1 month ago

@delthas could you share more details about your tests, like, for example, a script you are using?

nigoroll commented 1 month ago

@delthas could you try #4115 ?