plone / Products.CMFPlone

The core of the Plone content management system
https://plone.org
GNU General Public License v2.0
243 stars 186 forks source link

Enable resource caching per context to fix context aware expressions #3844

Closed petschki closed 6 months ago

petschki commented 11 months ago

fixes #3789

mister-roboto commented 11 months ago

@petschki thanks for creating this Pull Request and helping to improve Plone!

TL;DR: Finish pushing changes, pass all other checks, then paste a comment:

@jenkins-plone-org please run jobs

To ensure that these changes do not break other parts of Plone, the Plone test suite matrix needs to pass, but it takes 30-60 min. Other CI checks are usually much faster and the Plone Jenkins resources are limited, so when done pushing changes and all other checks pass either start all Jenkins PR jobs yourself, or simply add the comment above in this PR to start all the jobs automatically.

Happy hacking!

petschki commented 11 months ago

@jenkins-plone-org please run jobs

tlotze commented 11 months ago

@petschki Thanks for looking into this so quickly!

Your change looks good to me as far as it concerns fixing the case that the expression depends on context identity, which is probably a very common case and is what the expression that gave rise to the ticket is about.

I don't think, though, that it addresses the issue in the more fundamental way I tried to describe in my ticket comment. The expression could do more complex things than just look at basically immutable properties of the context (like portal type). Suppose, as an example, that the expression uses the context's workflow state which could change between requests; the new cache key wouldn't account for that.

What I'm trying to express is that the cache key shouldn't refer to parameters that might go into the expression but rather to the result of evaluating the expression. I'm not really sure whether evaluating the expression at this point might defeat caching to any notable extent, so working on this with someone who knows these parts of the code better would probably be a good idea. But to be correct, the code should indeed build the cache key from exactly those bundles that will be added to or removed from the theme and request after evaluating the expression, not just those that are statically enabled or disabled.

petschki commented 11 months ago

so working on this with someone who knows these parts of the code better would probably be a good idea.

👉🏼 @jensens

jensens commented 11 months ago

Yay. I will have a look, but it will take a while.

petschki commented 11 months ago

Yes, there is and was no invalidation at all. We could take the registry modification time into the cachekey https://github.com/plone/Products.CMFPlone/blob/master/Products/CMFPlone/resources/browser/resource.py#L323 ... and maybe the modification time of the context too, so that changes during the request would be updated too.

tlotze commented 11 months ago

But this is still a lot of guesswork. The expression can depend on basically anything, all it takes is a custom method on context that looks up the moon phase or whatever. I suggest I'll try to come up with a PR that uses the expression result and we discuss the performance implications of evaluating the expression at this point with @jensens.

petschki commented 11 months ago

I'm sorry, but I don't understand completely ... the expression result is either True or False depending on its logic.

I wonder how this was solved before the RR rewrite. Maybe there's some piece of code which we can reimplement here?

tlotze commented 11 months ago

Of course a single expression will tell whether a particular bundle should be added. What we need is the set of bundles whose expression evaluates to True. (And we need to make sure not to mix bundles to be added with those to be removed as the set union in line 61 currently does.)

petschki commented 11 months ago

Note: registry modification time is already in cachekey generator. https://github.com/plone/Products.CMFPlone/blob/master/Products/CMFPlone/resources/browser/resource.py#L66 ... this needs obviously be tested if it works correctly.

tlotze commented 11 months ago

Registry modification time looks strange as it is in the cache key only under a condition. Maybe that's not a problem, though. But in fact, I also consider including registry (or expression) modification time to be guesswork because neither registry nor context nor any other variable that goes in the evaluation context needs to be modified at all for the involved expressions to evaluate in unpredictable ways for any given request.

yurj commented 11 months ago

If the possible values are True or False, a couple of caches must be created. If there are 2 conditions, 4 caches must be saved. So just add the condition value to the cache key, it will get the cached value. The problem is that you also need 2 or 4 or whatever different bundles, depending on the result of the condition.

But I don't think the problem is cache related.

I find condition on bundles quite useless, it is application logic to load additional javascript or css on the fly. Bundle everything and live happy. Or add code in the page to load the additional css/js. Or tell webpack to use a different chunk for code with condition, and then webpack will load it when the condition is true (the result of the condition on the fly must be available to webpack).

Basically, we're mixing 2 different problems. Caching is for not rendering again the bundle, not for conditions.

What is the purpose to use caching when you've to calculate an expensive condition at every request? I don't get the picture, sorry.

yurj commented 11 months ago

https://github.com/plone/Products.CMFPlone/blob/3129403f3524a31a15dae152edefb9380486f1f8/Products/CMFPlone/resources/browser/resource.py#L288C1-L290C10

        self.renderer["js"] = webresource.ResourceRenderer(
            resolver_js, base_url=self.portal_state.portal_url()
        )

this does not depend on the condition, is this the problem?

yurj commented 11 months ago

I'm sorry, but I don't understand completely ... the expression result is either True or False depending on its logic.

I wonder how this was solved before the RR rewrite. Maybe there's some piece of code which we can reimplement here?

Before a different bundle was returned if the condition was True. But with webpack, if you want code in different file to talk (and module federation?) you've to use different chucks to be loaded conditionally based on the condition.

petschki commented 11 months ago

https://github.com/plone/Products.CMFPlone/blob/3129403f3524a31a15dae152edefb9380486f1f8/Products/CMFPlone/resources/browser/resource.py#L288C1-L290C10

        self.renderer["js"] = webresource.ResourceRenderer(
            resolver_js, base_url=self.portal_state.portal_url()
        )

this does not depend on the condition, is this the problem?

The expression is inside the PloneScriptResources which are registered to a ResourceGroup and then resolved with the ResourceResolver and rendered with the ResourceRenderer ... this is webresource logic. Not very explicit though IMHO.

yurj commented 11 months ago

https://github.com/plone/Products.CMFPlone/blob/3129403f3524a31a15dae152edefb9380486f1f8/Products/CMFPlone/resources/browser/resource.py#L288C1-L290C10

        self.renderer["js"] = webresource.ResourceRenderer(
            resolver_js, base_url=self.portal_state.portal_url()
        )

this does not depend on the condition, is this the problem?

The expression is inside the PloneScriptResources which are registered to a ResourceGroup and then resolved with the ResourceResolver and rendered with the ResourceRenderer ... this is webresource logic. Not very explicit though IMHO.

And how you can evaluate the expression? This result is cached, so you need the result of the condition in the cache, thus having 2^n (n=number of conditions) bundles.

tlotze commented 11 months ago

Yes, at least in theory. Maybe not all combinations actually occur, but if they do, caching would at least work correctly. But what's the problem with many combinations being cached? The only problem I see is that evaluating might be expensive. If that is really a reason not to cache with respect to the actual expression values, expressions don't work in all the generality they allow.

Based on that observation, we might as well merge this PR to solve the 80% case now and reconsider the whole thing afterwards.

petschki commented 11 months ago

@yurj thats the problem @tlotze adresses. The cache is calculated once per context depending on registry mtime and its expression conditions at that time. There are many more expression examples which could be discussed in order to deliver a sane OOTB solution for this. If there are special cases (and I really mean moonphases c'mon ;) we should deliver a solution to hook/adapt into this cache mechanism to make it easy for integrators to customize.

tlotze commented 11 months ago

Just for the record: The moon phases example was intentionally absurd to illustrate that you just cannot know what the expression might do, and that the result doesn't have to depend on any input that you know before calling the expression, i.e. expression text, context object, request properties, these things. The expression might call code that does things which the caching code cannot anticipate, and those things might make perfect sense for the application.

MrTango commented 11 months ago

I'm with @tlotze here. The right solution should eval the expression. But more important right now, would be the 80% fix, as this is biting us in many cases. So if we don't find a quick solution for the 100%, let's just merge what we have. And improve it later. I'm even fine with, resources with expressions are being ignored by caching. Just not cache them at all. On the other hand, the resource it self could be cached. The problem is merging them with other resources. That was avoided in the old registry. When ever you had a condition different to others the resource was delivered separate, but could be cached bu the browser.

petschki commented 11 months ago

I'll have a look at the registry mtime though. There is a problem with the condition.

tlotze commented 11 months ago

Given that we're going to merge the strategy of making the cache key depend on context, also adding context mtime would be easy and helpful IMO to really make this an 80% improvement.

petschki commented 11 months ago

Checking mtime in the cachekey was a simple indentation error. I've added a test and fixed it. From my point of view good to merge now.

@jenkins-plone-org please run jobs

petschki commented 11 months ago

@mauritsvanrees jenkins tests break because of the new 6019 version number https://jenkins.plone.org/job/pull-request-6.0-3.8/2908/ ... why?

petschki commented 11 months ago

@jenkins-plone-org please run jobs

petschki commented 11 months ago

Ah ... rebased the branch ... should be green now.

jensens commented 10 months ago

I think this is technically fine. Just for really large sites, will RAM-usage explode?

MrTango commented 8 months ago

@mauritsvanrees can we merge this and have it in a release soon?

mauritsvanrees commented 8 months ago

I made two change requests. But it does seem to work as intended, and fixes the problem. So in general +1.

You do get multiple cache keys per context now, instead of per site: 2 for js and css, times the number of different combinations of roles. The value that is cached is not too much, I saw 400 and 700 bytes locally. Let's say that you get combinations that result in total in 10 KB per context. If you then have a site with 10 thousand pages, and you visit them all, in all combinations, this gives a 100 MB increase in memory. Times the number of threads and zeoclients. So this change does have an impact on memory, but it seems reasonable to me.

I can imagine that in the future we want to be restrictive with the allowed expressions, allowing only anonymous versus authenticated, and allowing 1 (or maybe more) portal_types, and not anything else. That would make this code simpler, and would need less cache keys. For more advanced use cases, like indeed the phases of the moon, people can register an own viewlet apart from the resource registries, and do whatever they please. But that is not for the near future.

yurj commented 8 months ago

@mauritsvanrees or anyone involved

while you're on this patch, can you add **{"data-bundle": "diazo"},

at line 245 and 267?

This is for back compatibility with backend,xml (https://github.com/plone/plonetheme.barceloneta/blob/030535a1e14a321cf46d2280a51338a9a8cd99b8/plonetheme/barceloneta/theme/backend.xml#L90) in Barceloneta theme. This would help in removing the main theme if we want to use Barceloneta to style the backend (content editing, control panel).

This change just add the attribute data-bundle="diazo" like in the other scripts/css which already have the data-bundle attribute set to the bundle name and this can be used to eventually drop them when we want to use Barceloneta instead of the current theme.

See also https://github.com/plone/documentation/pull/1592#issuecomment-1855907291

jensens commented 8 months ago

For very large sites, which already have some varnish or alike in front, it would be a must have to turn off caching of it at all even in prod. If it is 100MB for 10k pages, then some sites I now with >500k pages would explode in RAM usage.

petschki commented 8 months ago

@mauritsvanrees or anyone involved

while you're on this patch, can you add **{"data-bundle": "diazo"},

at line 245 and 267?

@yurj I think this change would get merged faster if you open a separate PR. We're not ready here yet.

tlotze commented 7 months ago

For very large sites, which already have some varnish or alike in front, it would be a must have to turn off caching of it at all even in prod. If it is 100MB for 10k pages, then some sites I now with >500k pages would explode in RAM usage.

Sorry to reiterate, but RAM usage wouldn't be a function of the number of possible context objects if the cache key involved only the outcomes of evaluating conditions rather than a selection of possible inputs. It would depend exponentially on the number of conditions but even if they are many, that number only depends on code and configuration but is fixed with respect to the amount of content in the site.

jensens commented 6 months ago

Currently the hashtool.update(self.context.absolute_url().encode("utf8")) adds a cache item per context.

mauritsvanrees commented 6 months ago

I think the current state is: this fixes an irritating bug, but there are doubts beause it increases memory usage, potentially a lot for large sites.

Idea: put the new code behind a switch. Enable it when an environment variable is set. Then it can more easily be tested in practice.

yurj commented 6 months ago

What about dropping conditions at all? They cannot be cached and their purpose is not useful anyway, put the script or css code on an html head viewlet if you need to load it when there's a condition. If you've varnish or a cache in front of the site, the problem of condition evaluation will rise again because Varnish will not know about your application logic. Cache invalidation and conditions on context should be run on the client side if possible.

MrTango commented 6 months ago

@mauritsvanrees if we just merge it, it only is a problem when you have expressions. If this really becomes a problem in huge sites, one way would be to remove the expressions in those sites.

mauritsvanrees commented 6 months ago

@mauritsvanrees if we just merge it, it only is a problem when you have expressions. If this really becomes a problem in huge sites, one way would be to remove the expressions in those sites.

In standard Classic UI Plone, we have the eventedit bundle from plone.app.event, which has a condition: python: member is not None. So this will have an effect on all Plone sites. The javascript from this bundle is small, so if there are no side effects from including it for everyone, we could remove the expression in the plone.app.event code.

I checked a customer site, and it had the same expression for two other bundles: datagridfield-bundle and mosaic.

But still: even if there are no expressions, after merging this PR all Plone sites will use more memory and be a bit slower, right? This is because the hash will differ per context, so the resource cache is far less effective: there will be at least two resource caches per context, instead of per site.

This could be improved by checking if there are any expressions other than python: member is not None (or similar). If not, then the context does not need to be added to the hash.

Still, I think putting this behind an environment variable so we can opt in to try it out in practice, would be better.

petschki commented 6 months ago

After discussing this in yesterdays Classic-UI Meeting and sleeping over it I'm against my current implementation of generating the cachekey because it adds way too much attributes with the same stored value to the portal.

After some investigations and timing logs I came to the conclusion, that there's nearly no difference in calculating the cachekey vs. rendering the webresources uncached.

So I will come up with a new approach in a separate PR (#3900) which removes the volatile cachekey attributes completely and only caches the rendered viewlet content on the request in order to not calculate it twice (because we have 2 viewlets with the same update() method ... could also be cleaned up in another PR to render only one ResourcesViewlet)

jensens commented 6 months ago

I think you are right, calculating a complex cache key makes no sense. Lets close this one and go on with #3900