dridi commented 7 months ago

This is the third iteration of this change, following a first iteration in #3992 that I spontaneously replaced with a second iteration in #4032 that received very helpful reviews.

In this new installment I took the design from #4032 and moved one more step in that direction.

The first important change is the strict semantics of the OC_F_BUSY flags. Its main purpose is now to inform the decision to disemnbark a request into an objhead waiting list. It still applies in other places to identify objects meant to be looked up for which the caching outcome is not known yet.

As a result, this flag is only ever raised in hsh_insert_busyobj().

In a couple places, this flag was checked as a means of coordination between a client and a fetch tasks. This is the role of a boc to coordinate these tasks, so the boc grew the responsibility of (indirectly) clearing the OC_F_BUSY flag in the fetch task.

In that regard, there are now stronger guarantees around this flag:

it only applies to potentially shareable objects (excluding req.body, synthetic and private objects)
there are four general paths that clear the flag
- when the beresp is ready for streaming (if beresp.do_stream)
- when the beresp fetch completes (if ! beresp.do_stream)
- if the fetch fails before the step implied by beresp.do_stream
- if the fetch task is never created (the busy objcore is withdrawn)
clearing the flag, and only that, can trigger a rush
- with the notable exception of HSH_Insert(), not involving a fetch
the rush policy is derived from the objcore flags
- the new policy for shareable objects is to wake up all requests

The convergence of all paths towards consistently dropping the OC_F_BUSY flag as soon as there is a caching outcome or lack thereof allows a finer grained waiting list activity (much less spurious wake ups) with lower latency.

This should result in overall less locking on objheads.

A request rushed reembarks the lookup step with the objcore that just dropped its busy flag. This means that we can perform a cache hit before entering the large lookup critical section if the object is shareable and matches the request. Even if the objcore was cached stale, it can now be served to all waiting clients. This enables a response-wide no-cache behavior (by opposition to per-header) that is currently prevented by the built-in VCL, but now in the realm of the possible.

A little bit of trivia: the exact same test case covering the partial no-cache support exists in all iterations of this change.

Being able to serve stale (but valid) waiting list hits solves the serialization problem for the most part. It shifts the problem towards incompatible variants, which creates spurious wake ups that are compensated by all the ones that were eliminated. It makes the problem much worse when there are a lot of variants, for example with a Vary: User-Agent header (that should be a reason to make them uncacheable).

In that case the vary_notice parameter can help diagnose such scenarios. A new vary_limit parameter could also be added as a mitigation, to result in pass transactions above that threshold. This is outside of the scope of this patch series, since this problem already exists with very similar detrimental effects.

dridi commented 6 months ago

There was one waiting list quirk I was aware of that I completely forgot: the safety net for vcl_backend_error.

Since the default ttl+grace+keep is zero upon entering vcl_backend_error, we currently inject an artificial life time for the synthetic object to allow the waiting list to get a chance to see it. With this patch series, this is no longer needed, because the waiting list can now process a lack of life time for a shareable object (aka cache-control: no-cache).

It is debatable whether we want to keep the artificial life time: on one hand it buffers backend errors for a little while, on the other hand it prevents new fetch attempts for a little while for epiphenomenonal backend errors.

Either way, the waiting list is no longer a problem, see 5bfcd0f5857945aa7085aa68c3a934e980bda45d.

dridi commented 5 months ago

4085 added a minor merge conflict, I will try to deal with it before bugwash.

dridi commented 5 months ago

After a first offline review with @bsdphk there are several things to change:

keep an exponential rush even for cacheable objcores
be less formal on [re]validation through waiting list hits
maybe keep the vcl_backend_error safety net

The third point is not a strict requirement, up for debate. If we keep the the safety net, we should prune its keep period (@AlveElde noticed that the safety net goes higher than the default value for shortlived, defeating its purpose in this scenario).

dridi commented 5 months ago

Patch series updated as per @bsdphk's review:

exponential rush is back (see b0f36e1dfd + 683bf6eb30)
less formal wording (see 683bf6eb30)
safety net still present for @nigoroll's review

Only two commits changed, the aforementioned ones.

dridi commented 5 months ago

I'm looking at the waiting list coverage that is failing in CI, I was able to reproduce it locally with some load.

dridi commented 5 months ago

This might be a bug in varnishtest that I don't understand yet. When a timeout happens, the server spec ends before the final txresp for no apparent reason (in the logs).

I made the following change:

--- bin/varnishtest/tests/r02422.vtc
+++ bin/varnishtest/tests/r02422.vtc
@@ -7,7 +7,7 @@ server s1 {

        rxreq
        # wait until the new version is ready
-       delay 1
+       loop 10 {delay 0.1}
        txresp -hdr "Etag: 6"
 } -start

And the logs for s1 look like this:

**** s1    rxhdr|GET / HTTP/1.1\r
**** s1    rxhdr|Host: 127.0.0.1\r
**** s1    rxhdr|User-Agent: c4\r
**** s1    rxhdr|X-Forwarded-For: 127.0.0.1\r
**** s1    rxhdr|Via: 1.1 v1 (Varnish/trunk)\r
**** s1    rxhdr|Accept-Encoding: gzip\r
**** s1    rxhdr|X-Varnish: 1014\r
**** s1    rxhdr|\r
**** s1    rxhdrlen = 148
**** s1    http[ 0] |GET
**** s1    http[ 1] |/
**** s1    http[ 2] |HTTP/1.1
**** s1    http[ 3] |Host: 127.0.0.1
**** s1    http[ 4] |User-Agent: c4
**** s1    http[ 5] |X-Forwarded-For: 127.0.0.1
**** s1    http[ 6] |Via: 1.1 v1 (Varnish/trunk)
**** s1    http[ 7] |Accept-Encoding: gzip
**** s1    http[ 8] |X-Varnish: 1014
**** s1    bodylen = 0
**   s1    === loop 10 {delay 0.1}
**** s1    Loop #0
**   s1    === delay 0.1
***  s1    delaying 0.1 second(s)
**** dT    1.231
**** s1    Loop #1
**   s1    === delay 0.1
***  s1    delaying 0.1 second(s)
**** dT    1.331
**** s1    Loop #2
**** s1    Loop #3
**** s1    Loop #4
**** s1    Loop #5
**** s1    Loop #6
**** s1    Loop #7
**** s1    Loop #8
**** s1    Loop #9
***  s1    shutting fd 5
**   s1    Ending

So c4 is left hanging without a response and eventually times out.

dridi commented 5 months ago

I found the culprit, the test became racy and varnishtest was misleading (and I was also not paying enough attention).

dridi commented 5 months ago

I added 5 patches at the beginning of the series to help future troubleshooting and close the race before making it too easy to trigger.

dridi commented 5 months ago

Never mind, I only made the problem harder to reproduce on my end with better synchronization, there is wrong linked to bringing back the exponential rush for hits.

dridi commented 5 months ago

The problem was that the exponential rush of hits was introduced without safeguards, allowing requests to rush each other in turn when a waiting list hit was followed by a VCL restart.

In r2422 the clients c5 and c6 would restart upon a regular hit and they are both supposed to enter c4's waiting list. However, if one of them "lost" the race against c4, it would fail a third fetch and rush the other one that in turn would fail a fetch and rush the other than meanwhile entered the waiting list. Until one of them would exceed max_restarts.

The fix introduces a fence between clients entering a waiting list and objcores rushing the waiting list. The objcores can only rush clients that entered before their initial rush.

dridi commented 3 months ago

Rebased onto master to resolve minor conflict with #4109 (touching adjacent lines in cache_varnishd.h).

varnishcache / varnish-cache

Partial support for Cache-Control's no-cache directive #4073

4085 added a minor merge conflict, I will try to deal with it before bugwash.