varnishcache / varnish-cache

Varnish Cache source code repository
https://www.varnish-cache.org
Other
3.56k stars 366 forks source link

Fix order of COLD events for directors vs. VCL (and VMODs, implicitly) #4037

Closed nigoroll closed 5 months ago

nigoroll commented 6 months ago

@delthas reported an interesting issue with vmod_dynamic: When the vmod receives a cold event, it might have some directors left to delete, and when it deletes them, they do not receive a cold event, and a follow up assertion triggers because a warm director is deleted.

I believe the root cause is actually a wrong ordering in Varnish-Cache:

vcldir_retire() only sends a VCL_EVENT_COLD when the vcl is warm. In other words, it (rightly, IMHO) asserts that a COLD event (if any) has alrady been posted when a director is deleted on a COLD vcl.

Yet, when the director is deleted upon a COLD vmod event, this assertion was wrong, because the COLD events for directors were only posted after vmod events.

Given that vmods do things like deleting directors, it appears (more) correct to post VDI COLD events before VMOD COLD events.

nigoroll commented 6 months ago

I feel like it should be the other way around: once everything is warm, "commit" the warmup transaction at the VCL level.

I looked into a similar question before adding the PR, and in general I agree that the current logic looks inconsistent: We mark the VCL warm before it is, and we mark it cold before it is.

But running git grep -C 5 -E -- '->is_(warm|cold)' shows that, besides assertions, the warm/cold attribute is used for exactly two purposes: VSC hiding and the VDI_Event() discussed here. And I think that for both purposes the current implementation is suitable, except for the ordering issue raised herein.

I think that any proposal to complicate this logic should be motivated by relevant use cases. For this reason and my expectation of canned worms, I have not thought about your proposal in more detail yet.

bsdphk commented 6 months ago

I think the ordering should be reversed, but it's obviously something which is badly underdocumented.

nigoroll commented 5 months ago

FTR, OKed by bugwash