varnish / varnish-modules

Collection of Varnish Cache modules (vmods) by Varnish Software
Other
185 stars 86 forks source link

extract request body portions (feature request?) #221

Closed watery closed 10 months ago

watery commented 10 months ago

I'm new to Varnish, installed 7.4 OSS, and need to cache some POST requests. I found this repo which has a bodyaccess VMOD, but if I understand it correctly, the vmod only allows checking whether some text (regexp) is present in the body, but not to extract it.

I have to cache requests like this, that returns info about product 11011:

POST /product/details (Content-Type: application/json)
{
  "equipment": "99999",
  "productId": "11011",
  "model": "ABC"
}

Those requests should be banned when a request like this one is received:

POST /customer/edit/product (Content-Type: multipart/form-data)
customerId=1001010011&productId=11011&productNickname=new_name

How can I address such cases? Can I request to extend bodyaccess to add a function to retrieve a portion of the body, like:

STRING reextract_req_body(REGEX re, STRING pattern)

reextract_req_body("customerId=(\d+)", "\1")

gquintard commented 10 months ago

Hi @watery,

So, what you want is achievable in a couple of ways. In Varnish Enterprise, there's vmod-xbody with .get_req_body(), which could get you that.

On the open source side, you have a bunch of options:

But, in truth, you want none of that, you want vmod-jq, that will allow you to access those fields without mucking around with regexes :-)

Hopefully that helps. I'm closing this as there are a bunch of options out there, so we probably won't invest time into building this feature, but we'll welcome a PR if one comes.

nigoroll commented 10 months ago

For your own idea of using a regular expression, the re vmod pretty much does exactly what you want.

But the way I read your question, you are actually receiving Content-Type: application/json requests which you want to allow and Content-Type: multipart/form-data which you want to deny. So maybe you could just make the decision based on the header?

As @gquintard pointed out, using a regex on JSON is not a great idea, because regexes do not properly parse the structured data. For example, "customerId=(\d+)" would also match a json string as here: {foo: "customerId=42"}. So the fail-safe recommendation is to parse json as such, but regexen are suitable for form-data.

Regarding JSON, the jq library used by by vmod jq was way too slow for my purposes, so I wrote vmod frozen, which is much faster. I never ran a benchmark, but it never showed up as an issue with our clients who process tens to hundreds of thousands of requests per second.

watery commented 10 months ago

First of all, thank you both @gquintard and @nigoroll for your very quick replies! Next, I'm on OSS 7.4 - I've just edited my opening post - so anything in the Enterprise version is off for me.

Let me say that I really installed Varnish in the last weekend so pardon me for any misuse of any technical term and I still have to explore all the available VMODs.

Let me add some clarifications.

But the way I read your question, you are actually receiving Content-Type: application/json requests which you want to allow and Content-Type: multipart/form-data which you want to deny. So maybe you could just make the decision based on the header?

All the requests should be handled / allowed, there's none that should be blocked or discarded. The different content-types were just examples from the requests I'm interested in, in particular the one that handles multipart/form-data is actually receiving a new image for the product, and that should ban / purge the cached copy of the requests that return products info.

As @gquintard pointed out, using a regex on JSON is not a great idea, because regexes do not properly parse the structured data. For example, "customerId=(\d+)" would also match a json string as here: {foo: "customerId=42"}. So the fail-safe recommendation is to parse json as such, but regexen are suitable for form-data.

Totally agree. The idea to use a regular expression was just a simple elaboration on the INT rematch_req_body(REGEX re) function that's already in bodyaccess to try to explain my requirement, but sure if there are VMODs more targeted at handling JSON it's best to use them.

The main point though, maybe I didn't explain it well, is that I need to process the client request body i.e. the req in vcl_recv(). As I understand from the official docs, the body isn't available in that step, and - I admit I just skimmed throught all your VMOD linked pages, so maybe I overlooked that - I didn't see where / how I can read it.

I found BOOL cache_req_body(BYTES size) but then?

watery commented 10 months ago

On the open source side, you have a bunch of options:

* https://code.uplex.de/uplex-varnish/libvmod-re/ , it might be a bit cumbersome for what you want seek, but I sure you can make it work, and @nigoroll will happily help you I'm sure

* https://github.com/gquintard/vmod_rers/blob/main/vmod.vcc#L33-L45 will also do what you ask, the only trick is that it's a `rust` vmod`, so if you are using `varnish` 6.X, it won't work, but I'll happily help to if you need it

But, in truth, you want none of that, you want vmod-jq, that will allow you to access those fields without mucking around with regexes :-)

Oh, rereading more closely: the first two in fact do regex's on the request body. So... were you suggesting to use one of them to extract the whole body and then pass it to vmod-jq?

nigoroll commented 10 months ago

Use re to extract parts of the request body with regexen. Use frozen to parse the request body as json and extract claims.

At least that is my recommendation.

gquintard commented 10 months ago

@watery , I would go with vmod-jq personally, only because I'm familiar with jq itself and because the API is dead simple (as for the others, you can parse the request body directly), but if you need more performance (do test, do measure), then graduate to faster solutions