varnishcache / varnish-cache

Varnish Cache source code repository
https://www.varnish-cache.org
Other
3.56k stars 365 forks source link

expire: Add EXP_Reduce() for better softpurges #4085

Closed AlveElde closed 3 months ago

AlveElde commented 3 months ago

This PR resolves an issue where softpurging an object would reset its expiry timer to the start of the objects grace period. Repeated softpurges of the same object would keep it away from expiry indefinetly, and softpurging an object in keep would bring it back into grace.

The effects of the current softpurge implementation are most severe when the cache contains a high number of objects that are regularly hit by key-based invalidation. This can result in a growing number of objects in cache, increasing load on the expiry thread mailbox, and contention on the exp mutex. Grace hits on stale objects long past their indended lifetime can also occur.

This patch series introduces EXP_Reduce() as an alternative to EXP_Rearm() for softpurging applications. purge.soft() has been changed to use the new function, and the VCC has been updated with the following:

The effect of a softpurge is dependent on the arguments given to the purge and
the current lifetime of the object. Let's consider an object in cache created
with *ttl*, *grace*, and *keep* of 60 seconds each:

``purge.soft(ttl = 0s, grace = -1s, keep = -1s)``

* If the object is **fresh**, the *ttl* is reduced to 0 seconds and the object
  expires after 120 seconds.
* If the object is **stale**, the expiry time is not changed.

``purge.soft(ttl = 0s, grace = 10s, keep = 10s)``

* If the object is **fresh**, the *ttl* is reduced to 0 seconds, *grace* and
  *keep* are reduced to 10 seconds. The object expires after 20 seconds.
* If the object has been **stale** for 5 seconds, *grace* is reduced to 5
  seconds and *keep* is reduced to 10 seconds. The object expires after 15
  seconds.
* If the object has been **stale** for 15 seconds, *grace* is reduced to 0
  seconds and *keep* is reduced to 5 seconds. The object expires after 5
  seconds.
* If the object has been **stale** for 20 seconds or more, the object
  immediately expires.

``purge.soft(ttl = 10s, grace = -1s, keep = -1s)``

* If the object has been **fresh** for 5 seconds, the *ttl* is reduced to 10
  seconds. The object expires after 130 seconds.
* If the object has been **fresh** for 55 seconds, the *ttl* is not changed. The
  object expires after 125 seconds.
* If the object is **stale**, the expiry time is not changed.

When the expiry time of an object is reduced due to a softpurge, an
``EXP_Reduce`` entry is logged under the ``ExpKill`` VSL tag. If a softpurge
does not reduce the expiry time of an object, an ``EXP_Unchanged`` entry is
logged instead.
nigoroll commented 3 months ago

Can't this be done with just a few lines of changes to EXP_Rearm() to change the logging and a wrapper to avoid the ttl reset? I do see the point, but the patch includes refactoring which I do not see the motivation for and code duplication which I think we should avoid.

dridi commented 3 months ago

I think one goal is precisely to not prevent EXP_Rearm() from extending the life of an object while ensuring that a soft purge doesn't.

AlveElde commented 3 months ago

Changing the behavior of EXP_Rearm() is going to break several VMODs, so I figured that providing a new function was the way to go.

Regarding code duplication, I tried to strike a middleground between code reuse and readability. But maybe there is a better way to structure it that I'm not seeing?

nigoroll commented 3 months ago

How would changing the logging break VMODs? I did not mean to suggest to change EXP_Rearm() other than the logging. I suggested adding EXP_Reduce() as a trivial wrapper of EXP_Rearm().

nigoroll commented 3 months ago

But maybe there is a better way to structure it that I'm not seeing?

I had something like this in mind:

diff --git a/bin/varnishd/cache/cache_expire.c b/bin/varnishd/cache/cache_expire.c
index 2b27371ba..2d8b34de4 100644
--- a/bin/varnishd/cache/cache_expire.c
+++ b/bin/varnishd/cache/cache_expire.c
@@ -262,6 +262,23 @@ EXP_Rearm(struct objcore *oc, vtim_real now,
            oc->timer_when, when, oc->flags);
 }

+
+void
+EXP_Reduce(struct objcore *oc, vtim_real now,
+    vtim_real ttl, vtim_real grace, vtim_real keep)
+{
+
+       CHECK_OBJ_NOTNULL(oc, OBJCORE_MAGIC);
+
+       if (!isnan(ttl)) {
+               ttl = now + ttl - oc->t_origin;
+               if (ttl >= oc->ttl)
+                       ttl = nan("");
+       }
+
+       EXP_Rearm(oc, now, ttl, grace, keep);
+}
+
 /*--------------------------------------------------------------------
  * Handle stuff in the inbox
  */
AlveElde commented 3 months ago

Ah, now I get what you mean, that is a clever solution I had not considered. I will reattempt the patch series based on your example.

nigoroll commented 3 months ago

also approved by @bsdphk during bugwash

lucaspouzac commented 1 month ago

Hi, Is it possible to report this fix on 7.5 branch and release a minor version ? We have a similar behavior when expiring objects that could look like this.