Closed xcir closed 12 months ago
I set vcl_req_reset
feature to disable and it looks like it no longer causes panics. (I'm still waiting to see what happens.)
Looking at some of the panics, it appears that the error occurs when the H2 reset(error=CANCEL) is issued and includes ESI processing.
Can you try an explicit return (fail)
from an ESI task?
created vtc. https://gist.github.com/xcir/5d6df8e2a82f3d5e60505b710c0e3b47
ubuntu@proxy04:~/vtc$ diff esi.vtc esi.p.vtc
5c5
< #varnish v1 -cliok "param.set feature -vcl_req_reset"
---
> varnish v1 -cliok "param.set feature -vcl_req_reset"
###+vcl_req_reset
ubuntu@proxy04:~/vtc$ sudo varnishtest esi.vtc |tail -n1
# top TEST esi.vtc FAILED (1.178) exit=2
###-vcl_req_reset
ubuntu@proxy04:~/vtc$ sudo varnishtest esi.p.vtc
# top TEST esi.p.vtc passed (1.221)
remove this comment out, it works fine.
@Dridi Sorry, I missed it. If the VTC is not enough, I will do it.
Simpler reproducer:
--- i/bin/varnishtest/tests/e00003.vtc
+++ w/bin/varnishtest/tests/e00003.vtc
@@ -21,11 +21,13 @@ server s1 {
varnish v1 -vcl+backend {
sub vcl_synth {
+ if (req.esi_level > 0) { return (fail); }
set resp.body = """
""";
return (deliver);
}
sub vcl_recv {
+ if (req.esi_level > 0) { return (fail); }
if (req.esi_level > 0) {
set req.url = req.url + req.esi_level;
if (req.url ~ "^/synth") {
When the circuit breaker kicks in, it emulates a return (fail)
that goes first to vcl_synth
but failing ESI from vcl_synth
will panic. The vcl_req_reset
flag exposed that since it will act as a circuit breaker again before entering vcl_synth
.
Here is a patch that went in Varnish Enterprise for a different reason due to the parallel nature of ESI there:
--- i/bin/varnishd/cache/cache_req_fsm.c
+++ w/bin/varnishd/cache/cache_req_fsm.c
@@ -333,7 +333,11 @@ cnt_synth(struct worker *wrk, struct req *req)
VSLb_ts_req(req, "Process", W_TIM_real(wrk));
- if (wrk->vpi->handling == VCL_RET_FAIL) {
+ while (wrk->vpi->handling == VCL_RET_FAIL) {
+ if (req->esi_level > 0) {
+ wrk->vpi->handling = VCL_RET_DELIVER;
+ break;
+ }
VSB_destroy(&synth_body);
(void)VRB_Ignore(req);
(void)req->transport->minimal_response(req, 500);
I overlooked it while porting the patch series here based on what this was solving according to the commit message. I did not realize the problem with parallel execution of ESI was hiding another problem with serial ESI.
Bugwash: push patch with test coverage.
Expected Behavior
No panic.
Even if it is some kind of error, 503 VCL failed is preferable instead of panic.
Current Behavior
varnishstat -1
https://gist.github.com/xcir/767063b6d64bf86eb9e3c5fa64453200Possible Solution
No response
Steps to Reproduce (for bugs)
No response
Context
I have raised the version from 7.3.0 to 7.4.2 on a proxy that uses ESI and it caused a panic. I think it is some kind of race condition because nothing happened when I looked at the panicked URL again, but I can't think of the cause at all at the moment. The frequency was relatively high, and the panics occurred within a few minutes.
Varnish Cache version
varnishd (varnish-7.4.2 revision cd1d10ab53a6f6115b2b4f3b2a1da94c1f749f80)
Operating system
ubuntu22.04.3
Source of binary packages used (if any)
No response