nigoroll / libvmod-dynamic

The Varnish dns/named director continued
BSD 2-Clause "Simplified" License
95 stars 34 forks source link

Assert error in VBP_Remove() #100

Closed nigoroll closed 1 year ago

nigoroll commented 1 year ago

Seen with ad2285bb5915ceee6949f02bdca15333bdd159d5 on varnish-cache 69fe28ffd981a3714f32faa54a4c47c6c5f1753e, which is based on 48681cc872c17d339bb6ddb47304db0685e844ba:

  0x4f0dba: /opt/varnish/sbin/varnishd() [0x4f0dba]
Assert error in VBP_Remove(), cache/cache_backend_probe.c line 718:
  Condition(vt->heap_idx == VBH_NOIDX) not true.
version = varnish-trunk revision 69fe28ffd981a3714f32faa54a4c47c6c5f1753e, vrt api = 17.0
ident = Linux,3.10.0-862.14.4.el7.x86_64,x86_64,-jnone,-smalloc,-sdefault,-hcritbit,epoll
now = 146369807.049963 (mono), 1689082818.407869 (real)
Backtrace:
  0x44e3c2: /opt/varnish/sbin/varnishd() [0x44e3c2]
  0x44eacf: /opt/varnish/sbin/varnishd() [0x44eacf]
  0x4ee962: /opt/varnish/sbin/varnishd(VAS_Fail+0x53) [0x4ee962]
  0x41fb40: /opt/varnish/sbin/varnishd(VBP_Remove+0x15f) [0x41fb40]
  0x41bdd7: /opt/varnish/sbin/varnishd() [0x41bdd7]
  0x41bf4a: /opt/varnish/sbin/varnishd() [0x41bf4a]
  0x4751d6: /opt/varnish/sbin/varnishd() [0x4751d6]
  0x4752fa: /opt/varnish/sbin/varnishd() [0x4752fa]
  0x4755aa: /opt/varnish/sbin/varnishd(VRT_Assign_Backend+0x12b) [0x4755aa]
  0x7f9d011eaded: ./vmod_cache/_vmod_dynamic.5658350bfea26ceaa0bf26746c43cfa37683f4f5a1d373819db83c01b95cee9a(+0x6ded) [0x7f9d011eaded]
  0x475421: /opt/varnish/sbin/varnishd(VRT_DelDirector+0x122) [0x475421]
  0x7f9d011eaa30: ./vmod_cache/_vmod_dynamic.5658350bfea26ceaa0bf26746c43cfa37683f4f5a1d373819db83c01b95cee9a(+0x6a30) [0x7f9d011eaa30]
  0x7f9d011ec527: ./vmod_cache/_vmod_dynamic.5658350bfea26ceaa0bf26746c43cfa37683f4f5a1d373819db83c01b95cee9a(vmod_event+0x437) [0x7f9d011ec527]
  0x7f9d115143cd: vcl_boot.1688716416.512333/vgc.so(+0xba3cd) [0x7f9d115143cd]
  0x45f67d: /opt/varnish/sbin/varnishd() [0x45f67d]
  0x460b0e: /opt/varnish/sbin/varnishd() [0x460b0e]
  0x461d82: /opt/varnish/sbin/varnishd() [0x461d82]
  0x4f049c: /opt/varnish/sbin/varnishd() [0x4f049c]
  0x4f0810: /opt/varnish/sbin/varnishd() [0x4f0810]
  0x4f0dba: /opt/varnish/sbin/varnishd() [0x4f0dba]
nigoroll commented 1 year ago

I believe this is a https://github.com/varnishcache/varnish-cache bug. Suppose the following happens:

vbp_task() finishes with vt->running = 0 and a heap insert. The vbp_cond is signaled under the lock, but now instead of vbp_thread() waking up first, VBP_Remove() gets the lock and reaches assert(vt->heap_idx == VBH_NOIDX); before the racing vbp_thread() deleted the heap.

This is unlikely to happen with static backends, because for those, the probe is stopped via the vcl temperature before they get removed.