plone / plone.rest

Plone support for HTTP verbs (GET, POST, PUT, DELETE, ...)
https://pypi.org/project/plone.rest/
11 stars 5 forks source link

Content negotiation leads to cache poisoning #73

Closed hvelarde closed 1 year ago

hvelarde commented 6 years ago

As explained on a post in the community forum, intermediate proxies cause issues if we don't include a Vary: Accept header.

AFAIK, we can fix this using two approaches:

IMO, the later is harder and prone to errors.

tisto commented 6 years ago

@hvelarde as written on community.plone.org, we need a more detailed bug report. Hard-coding the Vary header does not seem like a proper solution to me.

hvelarde commented 6 years ago

I think the bug report is clear enough: if I traverse to the default view of an object, I get normal HTML; if I traverse to the default view of an object requesting application/json, I get JSON; intermediate proxies like Varnish are not able to store both responses unless we add a Vary header to those responses, and this will cause serous issues because an user can get the wrong response on a cached request.

I'm not proposing to hard-code a Vary header to plone.rest; I just want to discuss about possible solutions as I foresee only those, but you may be able to get different ones, because you know better how this works.

erral commented 6 years ago

I think I get your point and I have an example:

We have this site where we are using standard plone.app.caching rules and Varnish to cache the responses and we are also using plone.restapi: https://dantzan.eus

Let's say we visit this URL with the browser: https://dantzan.eus/kidea/ezpalak/bolant-dantzariak after the visit, it's cached by Varnish and served by it for the subsequent requests.

Now if issue a GET request to that URL adding the Accept: application/json header, we get the cached HTML value instead of the JSON representation.

The reason behind it is that Varnish thinks that we have requested the same content, regardless the Accept: application/json header we used in the second request.

In a quick test, I have added the Accept value to the Vary attribute of the plone.app.caching rule that handles the previous URL, and I get two different cached responses: one for the HTML page and the other one for the JSON representation.

As a conclusion: I tend to think that we should document this somewhere in plone.restapi documentation and state that if used in conjunction with proxy-caches, one should add the Accept value to the Vary attribute of the plone.app.caching rules.

erral commented 6 years ago

We have talk about this at the Beethoven sprint and have seen that it's necessary to add the Vary: Accept header to plone.rest as you say.

@tisto @davisagli

hvelarde commented 6 years ago

exactly, thanks a lot!

hvelarde commented 6 years ago

I spent some time analyzing this issue and I found a very good discussion in Drupal's site: External caches mix up response formats on URLs where content negotiation is in use.

@ale-rt @erral @jensens @tisto please take some time and read it carefully; my conclusion is now that the Accept header is not going to solve the issue because as is not easy to implement and some CDN don't even support it (Cloudflare, for instance).

Drupal people ended up implementing a query parameter based content negotiation as alternative to extensions (_format); the problem with this approach is it wouldn't be cached neither.

In Zope/Plone world we could add a new element into the traversal to solve the issue; for instance

finally, let's remember that content negotiation is not part of the Guiding Principles of REST.

another problem that will arrive with time is API versioning; that can be handled using prefixes like: /1.0, /1.1, /2.0 and so on.

tisto commented 6 years ago

@hvelarde thanks! Using /api is something we do at kitconcept anyways since quite a while. We could discuss if we use /api as an alternative way to hook into plone.rest(api). API versioning is something that we dismissed by purpose. Using version prefixes in REST APIs is an anti-pattern IMHO (but that's another discussion).

tisto commented 6 years ago

@hvelarde what if we would amend plone.rest to automatically hook into an API call when the URL starts with "/api"? Would that solve your problem?

This would also make debugging for devs a lot easier. Right now you always have to copy the REST call to the command line to investigate why a REST call failed. With that approach, the dev could test any REST call directly in the browser. This is something that Django-REST-Framework does as well for instance.

hvelarde commented 6 years ago

I think that may solve the issue as the URL would now only have one valid answer and no cache poisoning could happen.

mauritsvanrees commented 2 years ago

There is a ++api++ traverser now. Does that mean this can be closed? Or do we still need to handle calls to /api differently?

davisagli commented 2 years ago

I think we can close it. Using /++api++ URLs rather than content negotiation to get the API avoids the problem.

mauritsvanrees commented 1 year ago

On Plone 5.2 this can still be a problem. I am talking about ClassicUI here:

I see this in a project now. We have enabled the REST API because we added a few specific endpoints. No one should have any reason to actually request the application/json version of the homepage, but once they do, the json version will be in the cache, breaking the homepage for future visitors. cc @fredvd

It's "funny":

For Volto on 5.2 it might be okay, but I have not checked. I am pretty sure you would need to override the backend versions to use plone.rest 2.0.0 so the ++api++ traverser is available. And all urls without ++api++ must be handled by Volto and not end up in the backend. Maybe use different versions of plone.restapi and plone.app.caching as well, not sure.

On Plone 6.0.0 it seems okay:

I am trying to think of options to improve this in 5.2, either in core Plone or by configuration in a specific project, but this comment is long enough already, so let me post this part first.

mauritsvanrees commented 1 year ago

So how to fix this in 5.2? Note: at this point I do not yet much mind whether this is fixed in core or in a project-specific patch or in a collective package.

Change restapi Service render

In plone.restapi in services/__init__.py we could change a method of the Service class:

    def render(self):
        self.check_permission()
        content = self.reply()
        if content is not _no_content_marker:
            self.request.response.setHeader("Content-Type", self.content_type)
            # THE NEXT LINE IS EXTRA:
            self.request.response.setHeader("Vary", "Accept")
            return json.dumps(
                content, indent=2, sort_keys=True, separators=(", ", ": ")
            )

This does the trick in my testing.

Backport dynamic/terse caching

Copy the dynamic/terse caching code from plone.app.caching and the cache:rulesets from plone.restapi to a new package. collective.dynamicrestapicaching or something more readable.

Rewrite json requests in nginx

Add plone.rest 2.0.0 to your project so the ++api++ traverse is available. I have not tried, but it looks like this should work fine on Plone 5.2.

Change nginx or your other favourite web server to conditionally rewrite requests with "Accept: application/json" to go to the ++api++ url in Plone. I think this should be possible, but I am not sure.

Configure Varnish to not cache json requests. Or responses.

If an application/json request comes in, send it directly to Plone without caching the result. Or maybe when you get a result and it is application/json, do not cache it.

Totally untested and possibly completely wrong, but something like this:

if (req.http.accept && req.http.accept ~ "application/json") {
    return(pipe);
}

This might need an additional vmod.

Which solution seems best to you? Do you have another solution?

fredvd commented 1 year ago

The extra vmod that is probably a good idea for Vary header sanitising in Varnish is part of the varnish-modules repository: https://github.com/varnish/varnish-modules. You can already install these with the compile-vmods option in the recipe, and these are not part of the commercial varnish plus subscription. If we make the vmods active by default in the recipe, we could also start using the cookie module to clean up. And there is also a very interesting xkey vmod in that bundle. We should use return(pass) there instead of return(pipe) there.

The problem we had last week when debugging our text/html Mimetype on css resources was that we didn't know how unique our setup was and if we had small things different in Plone or Varnish, or Nginx, or a 'rogue' request coming in that only combined triggered this issue.

I suggest we try to be defensive and not depend on plone.recipe.varnish and/or fix the issue in VCL, because we cannot depend on most set ups using the recipe and have their own custom VCL or varnish installation. Always returning the vary: Accept header on json responses from plone.restapi in 5.2 would solve the issue in the most minimal way, but it might need some more code not to override already present other incoming Vary attributes on the browser request.

Backporting the 6.0 caching improvements sounds tempting....

mauritsvanrees commented 1 year ago

I tried out the first option, changing the restapi Service render method to set a Vary: Accept header. Or only do this in content/services/get.py to restrict the impact. The problem is that this reply then gets saved in Varnish forever. Well, probably not really forever, but we give no indication to Varnish how long it can keep it. So if you change the title of the homepage, you will not see it in Varnish, unless you have setup purging correctly, including purging the json responses for a content item.

So if we add such a header, we must add more headers. To do this properly, we would need a back port of the dynamic/terse caching from Plone 6.

Alternative would then be to add GET_application_json_ to the legacy template mappings on the Caching operations tab of the caching control panel. You can either add it to the (empty list) for Content folder view, or to Content item view. The only difference is in the details settings, where Content folder view has an extra copy in the ETags, so that might be the safest option.

I don't know where GET_application_json_ exactly comes from, likely plone.rest, but under the hood this is the name of the view that is used to GET the json of content.

This would be something to add to the plone.app.caching.interfaces.IPloneCacheSettings.templateRulesetMapping registry setting in plone.app.caching 2.x. Plus an upgrade step in plone.app.upgrade (if plone.app.caching is activated). I can do both. The extra line would be this:

<element key="GET_application_json_">plone.content.folderView</element>

This seems low impact for 5.2. Does this sound like the best solution?

Just to quickly summarise the cache poisoning problem that this should solve:

mauritsvanrees commented 1 year ago

I have two small PRs ready:

https://github.com/plone/plone.app.caching/pull/111 https://github.com/plone/plone.app.upgrade/pull/309

jensens commented 1 year ago

What about closing this as we already a) imporved plone.app.caching and b) have the ++api++ traverser?

mauritsvanrees commented 1 year ago

I think this is done yes.