varnish / varnish-modules

Collection of Varnish Cache modules (vmods) by Varnish Software
Other
185 stars 86 forks source link

When key does not exists then var.get() returns NULL and does not work with empty string comparison #223

Closed mateusz-kolecki closed 8 months ago

mateusz-kolecki commented 8 months ago

If the key does not exist then NULL is returned from var.get_string() and from var.get().

The documentation is not explicit about what should be returned when the key is not set.

$Function STRING get(PRIV_TASK, STRING )

Get `key` with data type STRING. If stored `key` is not a STRING an empty string is returned.

...

$Function STRING get_string(PRIV_TASK, STRING key)

Identical to get().

NULL will cause the explicit comparison to not work in VCL:

# /etc/varnish/default.vcl
vcl 4.1;

import std;
import var;

backend default {
    .host = "localhost";
    .port = "8080";
}

sub vcl_recv {
    if (req.url ~ "^/foo/") {
         var.set("hash-only-this", regsub(req.url, "^(?i:(?:/foo)/(.+))", "\1"));
    }
}

sub vcl_hash {
    if (var.get("hash-only-this") != "") {
        hash_data(var.get_string("hash-only-this"));
        std.log("var.get_string(hash-only-this) = " + var.get_string("hash-only-this"));
        return (lookup);
    }
}

Lets send some requests:

$ curl localhost/foo
$ curl localhost/foo/bar
$ curl localhost/no-this

varnishlog shows that the condition in vcl_hash is always true

$ varnish varnishlog | grep -i log                                          
-   VCL_Log        var.get_string(hash-only-this) = 
-   VCL_Log        var.get_string(hash-only-this) = bar
-   VCL_Log        var.get_string(hash-only-this) = 

If one assumes that an empty string is returned when the key does not exist then the explicit comparison in VLC does not work as expected. That is because of how VRT_strcmp() works - if one of the arguments is NULL then 1 is returned.

C code emitted for the if statement:

if (
        (0 != VRT_strcmp(Vmod_var_Func.get(ctx,
          ARG_priv_task_var,
          "hash-only-this"), ""))
      )
 { 
  /* ... */
 }

So the question is: should we change the docs or the implementation of var.get_string()?

gquintard commented 8 months ago

Hi!

Thanks for the report, you are correct, we are not doing what we said we would. Changing the implementation might break a bunch of systems, so we can only go with the docfix on this one.

Would you mind writing a VTC to make sure we return NULL and keep doing so in the future? (if you can't, I'll handle it, no worries)

dridi commented 8 months ago

I think it boils down to not comparing to a string in the first place:

    if (var.get("hash-only-this")) {
        # use the key
    }
gquintard commented 8 months ago

it does, but to be frank, string comparison has always been a footgun in varnish. Exposing NULL is silly and mostly useless

dridi commented 8 months ago

It's more a question of exposing "unset" to distinguish it from "set to an empty string".

So I think the implementation got it right, and the docs are wrong.

mateusz-kolecki commented 8 months ago

@Dridi Yes, that would work in this case but when, for some reason, you want to handle empty string ("") as well then your if statement would look like this:

    if (var.get("hash-only-this") && var.get("hash-only-this") != "") {
        # use the key
    }

And it is not obvious why you would need that if you assume that var.get() returns only stings and not NULL. If the docs would mention NULL in the non-existing key case then we would not have this conversation ;)

@gquintard I don't think I can handle writing VTC for this. If you could do that then I would be thankful.

mateusz-kolecki commented 8 months ago

I'm just wondering... is NULL a thing in VCL? I know that at the end of the day everything boils down to C but I do not see any mention of NULL in the Varnish docs where STRING is discussed.

I always had the belief that there is no such thing as NULL in the Varnish Configuration Language and it is handled somehow and instead of NULL, I would get an empty value for a certain type (empty string, zero, false and so on).

If NULL is not a thing in VCL spec then maybe returning NULL in var.get() is not valid and a different value for non-existing key should be returned.

I get that changing the behaviour of var.get() can potentially break some VCL out there and sticking to the existing behaviour makes more sense. In that case, I guess that documenting NULL in VCL is even more important.

What do you think?

dridi commented 8 months ago
if (req.http.some-header-name) {
        # enter this block if some-header-name exists, aka is not null
}

It's a thing, but you don't get to manipulate null beyond boolean tests.

gquintard commented 8 months ago

fixed via c4b5a9704d3d2266f8bc7cb51c5bee821d4c48cf