varnishcache / varnish-cache

Varnish Cache source code repository
https://www.varnish-cache.org
Other
3.56k stars 366 forks source link

cli: New ban.cancel command #4052

Open dridi opened 5 months ago

dridi commented 5 months ago
CLI_CMD(BAN_CANCEL,
        "ban.cancel",
        "ban.cancel [-j] <field> <operator> <arg> [&& <field> <oper> <arg> ...]",
        "Mark complete all bans where all the conditions match. If objects"
        " were invalidated by a ban before the ban was cancelled, the object"
        " remains invalidated. Bans with only ``obj`` expressions can be"
        " cancelled before ``ban_lurker_age``, but may still be evaluated"
        " during cache lookups and invalidate objects.",
        "See :ref:`vcl(7)_ban` for details",
        3, -1
)
nigoroll commented 4 months ago

tentative response: I can see why something like this would appear attractive to curtail the impact of an accidentally issued ban, but I do not feel confident to oversee the potential consequences, in particular with respect to persistence.

@Dridi which model do you have in mind? On IRC, you said "no effect on ban persistence", what does that mean? Would the canceled ban be still applied to persisted objects? Or would it not be applied? How would this look like?

The most natural scenario, in my mind, would be that eventually the canceled ban would not get exported any more and thus become invisible to persisted objects, but until then, it could still get reloaded.

So, consequently, would it not make sense to generate a "cancel" stv event?

dridi commented 4 months ago

@Dridi which model do you have in mind? On IRC, you said "no effect on ban persistence", what does that mean? Would the canceled ban be still applied to persisted objects? Or would it not be applied? How would this look like?

This strictly operates on the live ban list. There is no interaction with storage whatsoever.

The effect is "immediate", pretty much identical to the current effect of the ban_dups parameter when turned on.

nigoroll commented 4 months ago

There is no interaction with storage whatsoever.

I don't understand how this is supposed to work then. If stevedores are not involved, then the ban applies and is not canceled when a persistent storage is reloaded at the righ^Wwrong time.

The use case I have in mind here is to clone caches by snapshot/copying the storage, where this scenario is more likely than in the ordinary "we never restart our varnishes" case.

dridi commented 4 months ago

I meant that ban.cancel does not add new storage interactions, it just marks duplicate bans as completed. The same happens when ban inserts a new ban expression and ban_dups is on. Duplicate bans currently present in the ban list are marked as completed, nothing more.

That forced me to look at the rest of the ban code and found another candidate for BAN_Cancel(), and for that I needed to slightly change the signature.

Anyway, this command does not touch the ban list, it may only mark bans as completed.

bsdphk commented 4 months ago

I get why specifying the precise ban as arguments makes sense in general, but I think we should sugar the CLI to make it easy to cancel the most recently added bans, something like:

ban.cancel -4

to cancel the four most recently added bans ?

nigoroll commented 4 months ago

Notes from bugwash which Dridi could not attend:

@bsdphk was not aware that there is no stevedore notification of a cancel and we both agree that the implicit notification via the ban list cleanup and export can take a long time.

I still think that we should notify stevedores of a cancel such that they can also neuter the respective ban(s) during a storage load.

We both agree that a simple syntax is key to making this feature useful. phk's ban.cancel -<n> idea looks like it could be helpful in this regard.

After bugwash, I wondered if we would also want an "emergency stop" for the ban lurker?