Closed hvelarde closed 1 year ago
@hvelarde as written on community.plone.org, we need a more detailed bug report. Hard-coding the Vary header does not seem like a proper solution to me.
I think the bug report is clear enough: if I traverse to the default view of an object, I get normal HTML; if I traverse to the default view of an object requesting application/json
, I get JSON; intermediate proxies like Varnish are not able to store both responses unless we add a Vary
header to those responses, and this will cause serous issues because an user can get the wrong response on a cached request.
I'm not proposing to hard-code a Vary
header to plone.rest; I just want to discuss about possible solutions as I foresee only those, but you may be able to get different ones, because you know better how this works.
I think I get your point and I have an example:
We have this site where we are using standard plone.app.caching rules and Varnish to cache the responses and we are also using plone.restapi: https://dantzan.eus
Let's say we visit this URL with the browser: https://dantzan.eus/kidea/ezpalak/bolant-dantzariak after the visit, it's cached by Varnish and served by it for the subsequent requests.
Now if issue a GET request to that URL adding the Accept: application/json
header, we get the cached HTML value instead of the JSON representation.
The reason behind it is that Varnish thinks that we have requested the same content, regardless the Accept: application/json
header we used in the second request.
In a quick test, I have added the Accept
value to the Vary
attribute of the plone.app.caching rule that handles the previous URL, and I get two different cached responses: one for the HTML page and the other one for the JSON representation.
As a conclusion: I tend to think that we should document this somewhere in plone.restapi documentation and state that if used in conjunction with proxy-caches, one should add the Accept
value to the Vary
attribute of the plone.app.caching rules.
We have talk about this at the Beethoven sprint and have seen that it's necessary to add the Vary: Accept
header to plone.rest as you say.
@tisto @davisagli
exactly, thanks a lot!
I spent some time analyzing this issue and I found a very good discussion in Drupal's site: External caches mix up response formats on URLs where content negotiation is in use.
@ale-rt @erral @jensens @tisto please take some time and read it carefully; my conclusion is now that the Accept
header is not going to solve the issue because as is not easy to implement and some CDN don't even support it (Cloudflare, for instance).
Drupal people ended up implementing a query parameter based content negotiation as alternative to extensions (_format
); the problem with this approach is it wouldn't be cached neither.
In Zope/Plone world we could add a new element into the traversal to solve the issue; for instance
http://localhost:8080/Plone/foo
returns HTML as usuallyhttp://localhost:8080/Plone/foo/json
could return JSONhttp://localhost:8080/Plone/foo/xml
could return XMLfinally, let's remember that content negotiation is not part of the Guiding Principles of REST.
another problem that will arrive with time is API versioning; that can be handled using prefixes like: /1.0
, /1.1
, /2.0
and so on.
@hvelarde thanks! Using /api is something we do at kitconcept anyways since quite a while. We could discuss if we use /api as an alternative way to hook into plone.rest(api). API versioning is something that we dismissed by purpose. Using version prefixes in REST APIs is an anti-pattern IMHO (but that's another discussion).
@hvelarde what if we would amend plone.rest to automatically hook into an API call when the URL starts with "/api"? Would that solve your problem?
This would also make debugging for devs a lot easier. Right now you always have to copy the REST call to the command line to investigate why a REST call failed. With that approach, the dev could test any REST call directly in the browser. This is something that Django-REST-Framework does as well for instance.
I think that may solve the issue as the URL would now only have one valid answer and no cache poisoning could happen.
There is a ++api++
traverser now. Does that mean this can be closed? Or do we still need to handle calls to /api
differently?
I think we can close it. Using /++api++ URLs rather than content negotiation to get the API avoids the problem.
On Plone 5.2 this can still be a problem. I am talking about ClassicUI here:
plone.restapi
.curl -v -H "Accept: application/json" http://localhost:11047/Plone2 > /dev/null
curl -v http://localhost:11047/Plone2 > /dev/null
. You will see Content-Type: application/json
in the headers (this is what -v
does).I see this in a project now. We have enabled the REST API because we added a few specific endpoints. No one should have any reason to actually request the application/json
version of the homepage, but once they do, the json version will be in the cache, breaking the homepage for future visitors. cc @fredvd
It's "funny":
For Volto on 5.2 it might be okay, but I have not checked. I am pretty sure you would need to override the backend versions to use plone.rest
2.0.0 so the ++api++
traverser is available. And all urls without ++api++
must be handled by Volto and not end up in the backend. Maybe use different versions of plone.restapi
and plone.app.caching
as well, not sure.
On Plone 6.0.0 it seems okay:
++api++
traverser is available (though for ClassicUI this seems not technically needed).plone.app.caching
has a "dynamic" ruleset with a "terse caching" policy that adds Vary: Accept
as header. Responses are stored 10 seconds in the browser and 60 in Varnish.plone.restapi
has cache:ruleset
in zcml to link the various services to the dynamic policy.text/html
as answer.I am trying to think of options to improve this in 5.2, either in core Plone or by configuration in a specific project, but this comment is long enough already, so let me post this part first.
So how to fix this in 5.2? Note: at this point I do not yet much mind whether this is fixed in core or in a project-specific patch or in a collective package.
In plone.restapi
in services/__init__.py
we could change a method of the Service
class:
def render(self):
self.check_permission()
content = self.reply()
if content is not _no_content_marker:
self.request.response.setHeader("Content-Type", self.content_type)
# THE NEXT LINE IS EXTRA:
self.request.response.setHeader("Vary", "Accept")
return json.dumps(
content, indent=2, sort_keys=True, separators=(", ", ": ")
)
This does the trick in my testing.
Copy the dynamic/terse caching code from plone.app.caching
and the cache:rulesets
from plone.restapi
to a new package. collective.dynamicrestapicaching
or something more readable.
Add plone.rest
2.0.0 to your project so the ++api++
traverse is available. I have not tried, but it looks like this should work fine on Plone 5.2.
Change nginx or your other favourite web server to conditionally rewrite requests with "Accept: application/json" to go to the ++api++
url in Plone. I think this should be possible, but I am not sure.
If an application/json
request comes in, send it directly to Plone without caching the result.
Or maybe when you get a result and it is application/json
, do not cache it.
Totally untested and possibly completely wrong, but something like this:
if (req.http.accept && req.http.accept ~ "application/json") {
return(pipe);
}
This might need an additional vmod.
Which solution seems best to you? Do you have another solution?
The extra vmod that is probably a good idea for Vary header sanitising in Varnish is part of the varnish-modules repository: https://github.com/varnish/varnish-modules. You can already install these with the compile-vmods option in the recipe, and these are not part of the commercial varnish plus subscription. If we make the vmods active by default in the recipe, we could also start using the cookie module to clean up. And there is also a very interesting xkey vmod in that bundle. We should use return(pass) there instead of return(pipe) there.
The problem we had last week when debugging our text/html Mimetype on css resources was that we didn't know how unique our setup was and if we had small things different in Plone or Varnish, or Nginx, or a 'rogue' request coming in that only combined triggered this issue.
I suggest we try to be defensive and not depend on plone.recipe.varnish and/or fix the issue in VCL, because we cannot depend on most set ups using the recipe and have their own custom VCL or varnish installation. Always returning the vary: Accept header on json responses from plone.restapi in 5.2 would solve the issue in the most minimal way, but it might need some more code not to override already present other incoming Vary attributes on the browser request.
Backporting the 6.0 caching improvements sounds tempting....
I tried out the first option, changing the restapi Service render method to set a Vary: Accept
header. Or only do this in content/services/get.py
to restrict the impact. The problem is that this reply then gets saved in Varnish forever. Well, probably not really forever, but we give no indication to Varnish how long it can keep it. So if you change the title of the homepage, you will not see it in Varnish, unless you have setup purging correctly, including purging the json responses for a content item.
So if we add such a header, we must add more headers. To do this properly, we would need a back port of the dynamic/terse caching from Plone 6.
Alternative would then be to add GET_application_json_
to the legacy template mappings on the Caching operations tab of the caching control panel. You can either add it to the (empty list) for Content folder view, or to Content item view. The only difference is in the details settings, where Content folder view has an extra copy
in the ETags, so that might be the safest option.
I don't know where GET_application_json_
exactly comes from, likely plone.rest
, but under the hood this is the name of the view that is used to GET the json of content.
This would be something to add to the plone.app.caching.interfaces.IPloneCacheSettings.templateRulesetMapping
registry setting in plone.app.caching
2.x. Plus an upgrade step in plone.app.upgrade
(if plone.app.caching
is activated). I can do both. The extra line would be this:
<element key="GET_application_json_">plone.content.folderView</element>
This seems low impact for 5.2. Does this sound like the best solution?
Just to quickly summarise the cache poisoning problem that this should solve:
application/json
.I have two small PRs ready:
https://github.com/plone/plone.app.caching/pull/111 https://github.com/plone/plone.app.upgrade/pull/309
What about closing this as we already a) imporved plone.app.caching and b) have the ++api++
traverser?
I think this is done yes.
As explained on a post in the community forum, intermediate proxies cause issues if we don't include a
Vary: Accept
header.AFAIK, we can fix this using two approaches:
Vary
header to allGET
andHEAD
responsesIMO, the later is harder and prone to errors.