taking note of an issue which was almost forgotten: #3546 is just a special case of a broader issue: We should either keep object references until the end of the task or implement some other solution like copying to workspace (see also http_CopyHome) . Options need to be pondered.

I want to prepare a writeup of the broader topic

On the Lifetime of VCL data

PR #3546 fixed a special case of this problem:

References to storage objects are only valid until we release the reference to the object, yet with return(synth) and return(restart), we release that reference before the end of the task.

The PR addressed a specific case, where we implied that a string pointer was valid for the duration of the task.

Ticket #3600 triggered a discussion revealing the fact that we have not defined properly the lifetime requirements of VCL data.

This text is an attempt to holistically review the matter and make a proposal for a simple and safe solution.

Relevant types

Complex data types (passed by reference) defined for the VCL/vmod interface are:

$ egrep '^typedef .* \*\s+VCL_' include/vrt.h

typedef const struct vrt_acl *          VCL_ACL;
typedef const struct director *         VCL_BACKEND;
typedef const struct vrt_blob *         VCL_BLOB;
typedef const char *                VCL_BODY;
typedef const char *                VCL_ENUM;
typedef const struct gethdr_s *         VCL_HEADER;
typedef struct http *               VCL_HTTP;
typedef const struct suckaddr *         VCL_IP;
typedef const struct vrt_backend_probe *    VCL_PROBE;
typedef const struct vre *          VCL_REGEX;
typedef const struct stevedore *        VCL_STEVEDORE;
typedef const struct strands *          VCL_STRANDS;
typedef const char *                VCL_STRING;
typedef const struct vcl_sub *          VCL_SUB;
typedef struct vcl *                VCL_VCL;

In this discussion, VCL_HTTP, VCL_VCL, VCL_ACL and VCL_SUB are not considered, because these are intended to be instantiated only by core code.

VCL_ENUM is only generated by VGC.

VCL_STRANDS is regarded as an argument passing convention only and already has received an exception to whatever the existing rule might be. From vrt.h:

The memory management is very strict: a VMOD function receiving a STRANDS argument should keep no reference after the function returns. Retention of a STRANDS further in the ongoing task is undefined behavior and may result in a panic or data corruption.

in other words, a VCL_STRANDS argument can be a local variable (and actually is in VGC). This does not, however, apply to the individual strings in a VCL_STRANDS.

VCL_BACKEND and VCL_PROBE currently implicitly need to have a minimum lifetime of VCL, which might change with the introduction of reference counts.

VCL_STEVEDORE is a different animal altogether. Stevedores can not go before the last object in cache ceases to exist. We have not even started to formalize stevedores with a shorter lifetime than varnishd itself.

This leaves us with the following types to consider:

VCL_BLOB
VCL_BODY
VCL_HEADER
VCL_IP
VCL_REGEX
VCL_STRING
The individual strings in a VCL_STRANDS

Note on the definition of task lifetime

This discussion refers to task lifetime as that of a PRIV_TASK: The current housekeeping (vcl_init{} / vcl_fini{}), client or backend task until either its end or the next rollback.

Existing documentation

From: doc/sphinx/reference/vmod.rst:

[...] (M)ost of the non-native (C pointer) types are const, which, if returned by a vmod function/method, are assumed to be immutable. [...] When returning non-native values, the producing function is responsible for arranging memory management. Either by freeing the structure later by whatever means available or by using storage allocated from the client or backend workspaces. [...] Notice that allocations on the workspace do not need to be freed, their lifetime is the respective task.

So we already require that any "pointer value" be immutable (for its lifetime), but we do not properly define the lifetime.

Is has previously been argued that a minimum lifetime of task was implied, but at the very least we should take the opportunity to clarify this aspect.

Memory classes

VCL types currently could live at any memory address, we only conduct checks for strings and not consistently.

Memory location which we should consider in particular are:

Read-only memory mappings owned by varnishd or vmods, for example the .text section of varnishd itself or a (vmod) shared object
The heap, owned by the libc memory allocator
Storage owned by a stevedore
The respective task's workspace
The stack

Misc

Herein, function is used to refer both to functions and methods.

Minimum lifetime of return values

Which minimum lifetime should we require for vmod / VRT function return values?

Can it be longer than the task?

If return values had a longer required minimum lifetime, we were to end up in refcounting hell. And contradict some of the most important design principles of varnish: Most data is transient (lives as long as a client/backend request) and anything required to be persisted for longer is handled explicitly (and more expensively).

So, the minimum lifetime can not be longer than the task.

Can it be less than the task?

Let's presume we required any function processing a return value to not reference it after it returns.

How would a simple function call like:

funcA(funcB("x"), funcB("y"))

even work? By the hypothetical requirement, funcB() would be free to reuse the same piece of memory for the return values of both calls, yet both calls to funcB() would happen before funcA() even ran. So unless we descended into ABI land or wrapped all C function calls to copy return values "some safe place", this can't work.

So, the minimum lifetime of VGC / vmods needs to be the task

Minimum lifetime of function arguments

In C, function arguments are generally only required to remain valid for the duration of the function call, so unless the interface specifies additional requirements, they can, for example, live in the caller's stack frame.

In VCL, we know that function arguments can only come from three sources:

VGC
VRT function return values
vmods function return values

As, by the previous argument, VRT and vmods are already required to return values with a minimum lifetime of the task, the only component which could benefit from arguments on the stack was VGC.

In VGC, it is hard to see demand for being able to construct arguments on the stack: We do so for VCL_STRANDS, which has an exception already, but otherwise only call into VRT.

Would this change if we introduced support for local variables in VCL? Local variables would naturally live on the stack, but the types we consider in this discussion are all pointer types. Would we put the struct itself onto the stack? We would basically need to reinvent relevant parts of VRT to use stack memory alternatively to workspace, and stack memory would need to be provided by the caller (as a VRT function can not return memory from its own stack frame).

In short, complications would be manyfold and probably not worth the effort.

Thus, we should also define the minimum lifetime of function arguments to be the task.

What do we gain?

Currently, a common pattern in core code and vmods is:

  if (! WS_Inside(ws, begin, end))
    // copy to ws

This pattern has two issues:

As noted recently by @Dridi, WS_Inside() can not detect a use-after-rollback: Once we roll back a task, by definition, all pointers to the task's workspace become invalid, yet WS_Inside() can't see that [1]_
We waste workspace: For our memory classes, we uselessly copy for for 50% of the cases because arguments on the stack currently do not exist.

With a task minimum lifetime, we can remove the pattern above and simply keep references until the end of the task.

Can't we fix `WS_Inside()` to `WS_NeedCopy()`?

We could, but we would need to add lifetime information to all complex VCL types. This could be an element of the respective struct or information hidden in the pointer.

In theory, we could also look at the processes' memory map and make some assumptions like if a pointer is from a read-only mapping, then we assume that it is from a .text segment and thus constant. But the performance implications were relevant, portability was be an issue and we would likely not solve all cases safely.

What do we need to do?

To implement the regime

All function arguments and return values need to have at least task lifetime

we need to

Document and explain it to vmod authors
Ensure that we support keeping multiple object references until the end of the task (see #3546 and #3600)
Review all WS_Inside() calls and remove most of them

_[1]: We could always use a different workspace for a rollback, but that would relevantly impact existing code. Also it would be hard to guarantee that for all the rollbacks in a task we never use the same workspace more than once.

This is going to be long-ish.

First, there are two different issues:

A) Call argument lifetimes.

B) Return value lifetimes.

Second, our viewfinder here should encompas both VCC calling VMODs, VMODs calling VRT and VCC calling VRT, and everybody calling object methods: We want one unified architecture, not three different architectures.

Call Arguments

If we give call arguments "task lifetime", we have thereby decided that arguments to VRT_SetHdr() cannot live on the stack. That is a total non-starter to me, as matter of programmer sanity.

Call arguments will only be guaranteed to live for the duration of the call, and if the callee wants to stash a copy, it needs to perform whatever song&dance we settle on, to make that copy stable for its own task lifetime.

(Apart from programming languages which premptively copy all function arguments, none of which are performant, this is how all programming languages I know do things.)

I can only see that lead to two viable architectures:

I) Refcounting, where caller is responsible for holding a ref on the argument for (at least) the duration of the call, and the callee is responsible for gaining a ref, if it wants to use the argument longer than that. This restricts what the caller can subsequently do to refcount!=1 objects.

II) "Ask the owner", where all the relevant objects have a method, tentatively called .copy() where they will either return the object, if its lifetime is sufficient for the present task, or return a object copied to stable storage for the present task.

Under II) the object owner can use the .copy() function to implement refcounting, but only if Somebody™ also calls another method (.endcopy() ?) when the task ends.

The advantage of II) over I) is that we do not waste time refcounting object which, ipso-facto, have a guaranteed sufficient lifetime, for instance backends with VCL lifetime.

Return Values

Return values are much tricker than call arguments, because there is no equally obvious lifetime.

As I see it three viable options exist:

i) Refcounting. The return value comes with a refcount which must be released by the caller, this means that VMODs calling a VRT are responsible for deref'ing the return value and that VCC will need to learn to do so as well.

ii) Demand the return value lives for the duration of the current task. All copies to a different task causes a full copy to happen.

iii) Same as ii) but with a .copy() method, which can avoid copies to different tasks, where the liftime is ipso-facto sufficient.

Again, the benefit of iii) over ii) is less copying.

I'm not actually sure that ii) is possible: When a VMOD creates a director, there is no way to make a copy without asking the VMOD to do so, since the director is a black box to everybody else.

Conclusion

We have the choice between full blown ref-counting, which has the benefit that everybody knows how that song goes, and the disadvantage of tons of, in our case, mostly needless mutex operations.

Or we can insist that all return values live at least as long as the current task, and offer a .copy() method to transplant them to different tasks.

If we want to reduce the amount of copying by .copy(), by allowing objects to use refcounting, if that makes sense, we need to also add a .endcopy() method, and find out who can be trusted to call it. also offer the object

Re @bsdphk

we have thereby decided that arguments to VRT_SetHdr() cannot live on the stack.

This is actually a very good example because that function's signature is

VRT_SetHdr(VRT_CTX , VCL_HEADER hs, const char *p, ...)

So, as is, the hs argument would needlessly be required to live on the workspace, while p and the varags would not. The latter makes sense, but the former does not.

So I take your feedback as very valid criticism: If we followed this proposal, we should also review all VRT signatures and at least document the argument's lifetime requirements. A simple convention could be that use of the VCL_* typedefs implies the minimum task lifetime, while the spelled out type does not. Thus, the declaration could become:

VRT_SetHdr(VRT_CTX , const struct gethdr_s * hs, const char *p, ...)

I am not entirely happy with this suggestion and wish there was a better way?

Being at it, I would like to point out that there are cases where we still, implicitly, require that arguments not live on the stack. For example:

VCL_STRING
VRT_regsub(VRT_CTX, int all, VCL_STRING str, VCL_REGEX re,
    VCL_STRING sub)
        // [...]
        if (i == VRE_ERROR_NOMATCH)
                return (str);

If the outcome of this discussion was not to go along in the direction of the proposed solution, we should at least review VRT in this regard and make it consistent.

If we want to reduce the amount of copying by .copy(), by allowing objects to use refcounting, if that makes sense, we need to also add a .endcopy() method, and find out who can be trusted to call it. also offer the object

I think we could just use the existing PRIV_TASK scope for this. In my mind,

The generically recommended method for .copy implementations would be to create the copy on the destination task's workspace, which requires no "endcopy".
Alternatively, if refcounting was more appropriate, a PRIV_TASK can be allocated with the .fini() callback used to return the reference

Or we can insist that all return values live at least as long as the current task, and offer a .copy() method to transplant them to different tasks.

Yes, this is my proposal. Except that I also think we should make the same requirement for VCL_* arguments (see above, any better ideas to clarify VRT signatures are more than welcome).

If we want to reduce the amount of copying by .copy(), by allowing objects to use refcounting, if that makes sense, we need to also add a .endcopy() method, and find out who can be trusted to call it. also offer the object

I think we could just use the existing PRIV_TASK scope for this. In my mind,

That would require us to wrap all the relevant VCL types in a struct vmod_priv

That would require us to wrap all the relevant VCL types in a struct vmod_priv

I would think the vmod would only need to do it if it has to. Here is a similar working example where a PRIV_TASK is used for refcounting https://code.uplex.de/uplex-varnish/varnish-objvar/blob/master/src/vmod_globalvar.c#L97

This ticket has been discussed during bugwash. This note is to document what I believe to be general consensus. Please comment or leave a 👍🏽 or 👎🏽

VRT and VMOD function/methods need to ensure that for the VCL_* types

VCL_BLOB
VCL_BODY
VCL_HEADER
VCL_IP
VCL_REGEX
VCL_STRING
and the individual strings in a VCL_STRANDS

return values have at least task lifetime.

Arguments of these types, however, are owned by the caller and can reside on the stack.

Consequently, VRT and VMOD functions need to ensure that any arguments they return or reference for the duration of the task are copied to storage with at least task lifetime. A common pattern to ensure this is to copy any stored / saved arguments to the workspace (length() being a placeholder for a type-dependend method to determine the size of argument):

if (! WS_Inside(ws, argument, argument + length(argument))) {
    argument = WS_Copy(ws, argument, length(argument));
    if (argument == NULL) {
        VRT_fail(ctx, "Out of workspace");
        return (NULL);
    }
}

Notes:

Should we turn WS_Copy() into a noop if the argument is already WS_Inside() ?
We might want to replace WS_Inside() by WS_Allocated(), even if that still fails to detect use-after-rollback

If we agree to this definition, we will make (useless) workspace copies for any VRT/vmod function/method arguments which come from

Read-only memory mappings owned by varnishd or vmods, for example the .text section of varnishd itself or a (vmod) shared object
Storage owned by a stevedore

Besides the docfix, we should review our code and ensure it adheres to it.

On the positive side, we will not need to change how we reference object storage.

varnishcache / varnish-cache

lifetime and memory model: References to object storage #3600

On the Lifetime of VCL data

Relevant types

Note on the definition of task lifetime

Existing documentation

Memory classes

Misc

Minimum lifetime of return values

Can it be longer than the task?

Can it be less than the task?

Minimum lifetime of function arguments

What do we gain?

Can't we fix `WS_Inside()` to `WS_NeedCopy()`?

What do we need to do?

Call Arguments

Return Values

Conclusion

varnishcache / varnish-cache

lifetime and memory model: References to object storage #3600

On the Lifetime of VCL data

Relevant types

Note on the definition of task lifetime

Existing documentation

Memory classes

Misc

Minimum lifetime of return values

Can it be longer than the task?

Can it be less than the task?

Minimum lifetime of function arguments

What do we gain?

Can't we fix WS_Inside() to WS_NeedCopy()?

What do we need to do?

Call Arguments

Return Values

Conclusion

Can't we fix `WS_Inside()` to `WS_NeedCopy()`?