varnishcache / varnish-cache

Varnish Cache source code repository
https://www.varnish-cache.org
Other
3.56k stars 366 forks source link

[doc] return(retry|return) must follow a std.rollback() call #4060

Closed gquintard closed 4 months ago

gquintard commented 4 months ago

just a docfix to save users from hitting vcl-triggered panics, like here: https://github.com/varnish/varnish-modules/issues/222

walid-git commented 4 months ago

Should we also restrict std.rollback to subroutines where return (re{start,try}) are permitted ?

Also, note that CI is failing due to the space before the closing **

gquintard commented 4 months ago

thanks @walid-git , fixed

nigoroll commented 4 months ago

I'd like to understand this better. IIRC, we want to recommend a retry/restart soon after the rollback, but I would think that changes should still be possible.

walid-git commented 4 months ago

Note from bugwash: please $Restrict std.rollback to vcl subroutines where return (restart/retry) are valid

nigoroll commented 4 months ago

I agree we should recommend a restart/retry shortly after the rollback, but not require it and I would consider this issue an actual bug in Varnish-Cache. What happens here is that VCL assigns VRT_priv_{task,top}() to pointers in the preamble of each VGC function for a sub, so when rollback happens within the sub, these pointers become stale:

vcl:

sub t {
    var.set("foo", "bar");
    std.rollback(req);
    var.set("foo", "bar");
}

vgc:

VGC_function_t(VRT_CTX, enum vcl_func_call_e call,
    enum vcl_func_fail_e *failp)
{
  assert(call == VSUB_STATIC);
  assert(failp == NULL);
  assert(ctx->method & (VCL_MET_DELIVER|VCL_MET_HASH|VCL_MET_HIT|VCL_MET_MISS|VCL_MET_PASS|VCL_MET_PURGE|VCL_MET_RECV|VCL_MET_SYNTH));
  struct vmod_priv *ARG_priv_task_var = VRT_priv_task(ctx, &VGC_vmod_var);
  if (ARG_priv_task_var == NULL) {
    VRT_fail(ctx, "failed to get task priv for vmod var");
    return;
  }

  /* ... from ('/tmp/t.vcl' Line 5 Pos 7) */
  {
    {
      VPI_count(ctx, VGC_NREFS, 1);
      if (UNLIKELY(ctx->vpi->trace)) VPI_trace(ctx, 1);
      END_;
      Vmod_vmod_var_Func.f_set(ctx,
        ARG_priv_task_var,
        "foo",
        "bar"
      );
      END_;
      Vmod_vmod_std_Func.f_rollback(ctx,
        VRT_r_req(ctx)
      );
      END_;
      Vmod_vmod_var_Func.f_set(ctx,
        ARG_priv_task_var,
        "foo",
        "bar"
      );
      END_;
    }
  }
}

For the first Vmod_vmod_var_Func.f_set() call, ARG_priv_task_var is valid, but for the second, it is not, because after the call to Vmod_vmod_std_Func.f_rollback(), the workspace on which the priv_task state resides, is reset.

Note that this is only an issue for VMODs using the PRIV_TASK and PRIV_TOP arguments in the vcc file. VMODs using VRT_priv_{task,top}() explicitly get a fresh value for each call, and thus do not have this issue.

I see the following options to avoid this:

dridi commented 4 months ago
  • make the preamble an explicit data structure (probably a bound array), which gets passed to std.rollback to invalidate

Probably the other way around. A data structure that std.rollback() would find through the VPI.

nigoroll commented 4 months ago

addition: we could also retire the PRIV_TASK and PRIV_TOP vmod function/method arguments and simply require vmods to call the respective VRT functions.

Re @Dridi

Probably the other way around. A data structure that std.rollback() would find through the VPI.

Yes, and I would like to add that this data structure does already exist and that std.rollback() already finds it, so this new data structure would basically be a "per sub" view on it. Not sure if it's worth it...

gquintard commented 4 months ago

updated, thanks @walid-git for the rst fix

nigoroll commented 4 months ago

Following up on https://github.com/varnishcache/varnish-cache/pull/4060#issuecomment-1964640980

I agree we should recommend a restart/retry shortly after the rollback, but not require it and I would consider this issue an actual bug in Varnish-Cache. ...

I came up with an idea for a variation of the first idea, which now appears to me as most reasonable. So in total, I see these options now:

gquintard commented 4 months ago

maybe, just maybe, it's ok now

nigoroll commented 4 months ago

We have merged #4066, so the doc change is no longer adequate. Also, shouldn't the $Restriction be dropped now?