Open Vormillion opened 1 month ago
slightly relevant blog post: https://info.varnish-software.com/blog/using-obj-hits
it is a hit though: the object is found in the cache, albeit in a currently incomplete form. One could argue that vcl_hit
could offer a obj.still_downloading
boolean property, though
side note: the bereq (for example via varnishlog -g request
) would have been helpful.
Test case (written by @AlveElde , modified it slithly myself):
varnishtest "Cache HIT on shortlived error object with waitinglist"
server s1 {
rxreq
delay 10
} -start
varnish v1 -vcl+backend {
sub vcl_backend_fetch {
set bereq.first_byte_timeout = 1s;
}
sub vcl_deliver {
set resp.http.hits = obj.hits;
set resp.http.uncacheable = obj.uncacheable;
set resp.http.x-cache = req.http.x-cache;
}
sub vcl_hit {
set req.http.x-cache = "HIT";
}
sub vcl_miss {
set req.http.x-cache = "MISS";
}
} -start
client c1 {
txreq
rxresp
expect resp.status == 503
expect resp.http.hits == 0
expect resp.http.uncacheable == false
expect resp.http.x-cache == "MISS"
} -start
delay 0.5
client c2 {
txreq
rxresp
expect resp.status == 503
expect resp.http.hits == 1
expect resp.http.uncacheable == false
expect resp.http.x-cache == "HIT"
} -start
client c1 -wait
client c2 -wait
varnish v1 -expect MAIN.cache_hitmiss == 0
varnish v1 -expect MAIN.cache_miss == 1
varnish v1 -expect MAIN.cache_hit == 1
varnish v1 -expect MAIN.cache_hit_grace == 0
varnish v1 -expect MAIN.busy_sleep == 1
Bugwash: Things seem to work as they should, but we may be calling something a hit (VSC) which is not what users would think of as "hit" since delivery isn't immediate.
Return to this after thinking/looking at the statistics.
I would suggest the following:
MAIN.busy_sleep
to MAIN.cache_hit_coalesce
obj.was_coalescing
analogous to obj.is_hitmiss
etc.but regarding the OPs main point, I think that a coalescing (aka waitinglist) hit will need to continue ending up in vcl_hit, because it is that, even if it has to wait.
I would need to dig up my 242-slides presentation from a couple VDDs ago but I'm pretty sure something like that was part of the plan laid out.
I couldn't find such a flag in my presentation.
I would suggest the following:
* rename `MAIN.busy_sleep` to `MAIN.cache_hit_coalesce` * add `obj.was_coalescing` analogous to `obj.is_hitmiss` etc.
If we rename a busy counter, we should rename all of them to reflect their relation to the waiting list so I would prefer something like cache_hit_waitlist
and other counters renamed to waitlist_something
. See #4073 for the gory details, but "busy" was conflated between busyobj and waiting list and created a confusion in this area.
Regarding obj.was_coalescing
I think this is taking the problem from the wrong end and I would prefer something like req.was_waiting
to better reflect what went on.
I am also ok with @dridi's suggestion to consistently rename to waitlist
instead of coalesce
, if that was thought to be more descriptive. My personal opinion is that the waiting list is only the implementation, but the mechanism is coalescing, so I prefer the latter.
Regarding the flag to reside under obj
or req
, I it is related: req.was_waiting
somehow makes sense but sounds very general ("and for what was the request waiting?") while req.was_coalescing
in my mind also takes the problem from the wrong end, because it was the object that was the result of coalescing backend requests. On a side note, for restarts I think we would need to clear a request flag to signal exactly during which iteration the waitinglist was entered, and that, again, seems un-pola to me, because then the request might report was_waiting = false
when in fact it did before.
So all in all, obj.was_coalescing
actually seems to be more descriptive in my mind still, but I can go with whatever.
I think I'm actually against cache_hit_waitlist
because <counter>_<specialization>
is supposed to represent a subset of <counter>
and a waiting list hit is just a hit, so it could very well be a grace hit, and it shouldn't be presented as a sibling of cache_hit_grace
.
If we touch one of the busy_*
counters, they should probably all use the same waitlist_
or waitinglist_
prefix. For now we should probably leave them alone to avoid breaking setups for no good reason (we don't have VSC aliases for compatibility).
Regarding the original problem of long hit transactions, the same can happen without involving a waiting list. A hit on an object in the BOS_STREAM state (ongoing fetch) can also appear to take a long time in varnishlog
. Imagine a backend serving the response headers fast, but taking a long time to produce the response body, or simply a slow or long fetch in general.
Now regarding req.<new_waitinglist_flag_tbd>
, something potentially more useful could be instead a new variable req.waiting_list_time
telling you the amount of time spent in a waiting list. You could probably already craft this by storing now
in vcl_recv
and computing the difference in vcl_{hit,miss,pass}
. Keeping track of a duration can also translate fairly easily to a boolean:
if (req.waiting_list_time > 0) {
# ...
}
I'm aware that a DURATION called "something_time" can be confusing, I'm just not making an effort to find a name for the concept.
Expected Behavior
Waitinglist requests should not be considered as cached.
Current Behavior
After purge request to URL, next request is getting MISS and backend fetch as expected, but next requests are getting HIT VCL_call although they are in waitinglist.
Possible Solution
No response
Steps to Reproduce (for bugs)
No response
Context
We are trying to troubleshoot scenario, when varnishlog is showing that object was served from cache but in reality fetch time was quite long due to waitinglist.
We are setting special response header called X-Cache-Status:
Varnishlog output showing cache purge -> miss on first call -> hit on second call while request is in waitinglist so in reality it's not HIT, so my question is why VCL_HIT is called here.
Cache purge request
Fresh call to /napoje - this is MISS as expected
Second call to /napoje (in the meantime previous request is still fetching page from backend ~6 seconds)
Varnish Cache version
7.5.0
Operating system
Debian 12
Source of binary packages used (if any)
No response