Closed timriley closed 1 year ago
Thanks again for checking this over, @solnic and @jodosha! I've just made a few tweaks, many based on your feedback (see the last 5 commits), along with some adjustments in hanami/hanami to make sure views continue to work there with the changes I've made to Context
here: https://github.com/hanami/hanami/pull/1292.
I'm merging this now!
hanami-view has been close to functionally complete for a while, but its turned out to be rather slow. We needed to tune its performance before we consider this ready for release. I've done that here in this PR, as part of an overall refactoring which also sees the internal architecture become simpler, and several user-facing parts become easier to use.
There will undoubtedly be other areas we can explore for further performance gains, but I'm happy enough with what we've achieved so far, and after this PR I'm ready to move back onto working on completing the rest of the integration with the full Hanami framework.
The numbers
Let's start with the performance numbers, since these were the prime motivating factor.
First, our baseline. On the
main
branch, runningbenchmarks/comparative_benchmark.rb
(which a calls a view taking a 2-element array as an exposure and then renders a template and partial within a layout) gives the following result, nearly 2x slower than Rails:On this branch, we now have this result, over 12k more i/s, making for a 2.4x overall rendering performance improvement, which also puts us in front of Rails:
This PR gets us to this point, but there are further gains to be found via a couple of extra measures. Firstly, if we add the
decorate: false
to the exposure used on the view class, we get another nearly 9k i/s. This reveals the cost of part decoration on a simple view (more on this later):Alternatively, if we use a branch of dry-configurable that defines singleton setting accessor methods on config objects when they're finalized, we see a 5k i/s improvement over the current state of this branch:
And if we combine both those measures, we're getting close to doubling the already improved rendering performance currently on this branch.
I'll leave discussion of these additional performance gains until the end, and in the meantime, let me take you through the code changes that have already yielded the original 2.4x improvement.
A note on the benchmarks
While I've focused on the performance of a single benchmark as the main measure for the work in this PR, it's important we remember to take these figures with a grain of salt.
Every view rendering workload is different and performance will vary accordingly.
For example, if I turned this into more of an "index" screen and changed the single exposure to be an array of 40 elements, we we numbers more like this:
And with
decorate: false
:See how that now is an improvement of only 1.15x (from 3396 to 3934 i/s), whereas with our smaller array it resulted in an improvement of 1.4x. This is because with the larger array we're spending comparatively more time inside Tilt rendering partials, which lessens the overall impact of the part decoration.
And if I run this 40-element benchmark against that dry-configurable branch, we see the following:
This is now an extremely marginal gain of just 1.01x versus the 1.2x gain that we saw when we used the smaller array.
Overview of changes
I've cleaned up the history of this branch so each commit makes sense on its own. I'd encourage reviews to look at this commit by commit. But before doing this, take note of which commits I've explained in more detail below so those sections can serve as your guide 😊
The commits fall into three categories:
Here are key commits along with their impact on the benchmark. Take all of these with a grain of salt, too; there are definitely variations in the numbers on each benchmark run, but at very least, this should hopefully show that each changes takes us in a positive direction.
9117
0
9239
+122
9331
+92
9500
+169
9566
+66
16041
+6475
19712
+3671
20428
+716
20673
+245
21161
+488
21582
+421
21398
-184
21503
+105
Most of these commits are fairly self-explanatory via their diff and the commit message I've left, but there are a few more significant ones that I'll go through in more detail below.
Replace render environments with single Rendering
Based on the numbers above this, this is one of two big performance boosts we get. The internal changes here are also significant, with user experience ramifications too, so I want to take the time to go through them here.
The first big change here is that we removed the ability of
Renderer
tochdir
itself to different directories. This was the main way we looked up templates to render before. We would create three different renderer objects for eachView#call
,chdir
’ed into different places: the root, the layouts dir, and a dir matching the template's name.To boot, we would also do that via also create three corresponding
RenderEnvironment
instances, which were themselves responsible for managing the renderer:With this change, we've replaced
RenderEnvironment
withRendering
. TheRendering
represents just that: the rendering that happens each time you#call
a given view.The
Rendering
is very much equivalent to our previousRenderEnvironment
: it provides#template
,#partial
,#part
and#scope
methods that render templates or build parts or scopes based on the view's configuration, and we pass thisrendering
object to any place that needs to ability to render templates or build parts, like the part objects themselves.However, what we've done is change the way the internal renderer works, which allows us to create just a single
Rendering
and a singleRenderer
, instead of the 6 different objects we were creating before.The way we achieved this was to change the way
Renderer
works when looking up templates.Previously, the renderer would start in its current dir (wherever it had been
chdir
’ed to, per the arrangement with theRenderEnvironment
instances above), then search for templates in that current dir well as in a (hard-coded)shared/
subdirectory, before recursing upwards through the directory tree until it either found a template or reached the root:This was both quirky (with the way that
shared/
subdirectory worked being quite inflexible as well as unusual for a template-based view system) as well as inefficient: this is why we had to make those 3 different render environments, and internally it required us to create newPath
instances for every directory traversal, as we can see withchdir("..").lookup(...)
the#lookup_in_parent_dir
method above.To improve this arrangement, and to support us having that single
Rendering
instance above, we've changedRenderer
so we can have just a single instance across the lifecycle of theRendering
. To do this, we've made the renderer stateful, changed how it looks up templates.Renderer
is now stateful insofar as it maintains an@prefixes
array, which are subdirectory paths that it adds to the configured paths when it searches for templates. It defaults to["."]
, and then wheneverRenderer#template
is called, it adds the dirname of the given template name onto the prefixes array. This means that any partials rendered within the template will first be looked up relative to that template's own directory before falling back to being looked up at subdirectories matching any of the previous prefixes.Appending to the prefixes array works also for nested partials, so given a call to
renderer.template("users/index")
, the@prefixes
will be set to[".", "users"]
. If inside that template we do arender("common/user_info_box")
, then the@prefixes
will be set to[".", "users", "users/common"]
before that partial is rendered, and so on.The code to do this in
Renderer
is pretty straightforward (and much more straightforward compared to what we had before):In effect, this all means that templates will now be looked up relative to the most recently rendered template's own directory, using ordinary relative path mechanics. This is a significant usability improvement, because I believe it's what users expect from a template-based view system; we've moved firmly back to the path of least surprise here, which is a good thing given we're already introducing so many other new concepts in hanami-view.
And all of this, it seems, makes for a decent improvement to overall rendering performance, AFAICT from the reduction in object creation.
It also makes our rendering code much easier to follow, so this feels like a win-win-win situation!
A note on naming
So the reason I introduced
Rendering
as a name rather than keepRenderEnvironment
is that I thinkRendering
feels like it better conveys what's happening — this is an object that represents the state and facilities available as of the current rendering — the -ing suffix makes it feel both more active and transient.I realise this does mean we have the following three names:
Rendering
, the new class just introduced hereRenderer
, concerned specifically with finding and rendering the templates themselvesRendered
, a#to_str
-convertible object holding the result of the renderingHowever, since
Renderer
andRendering
are mostly internal, andRendered
is never physically typed by our users, I feel reasonably comfortable with these names even given their overlap. I did at least want to bring this up in case any of you had thoughts 😄Make PartBuilder and ScopeBuilder stateless
One of the things that concerned my while I worked on the above refactoring is that we had to make new part and scope builder instances for every rendering:
This seemed wasteful given we could also just pass the namespace and
self
when we actually invokePartBuilder#call
andScopeBuilder#call
elsewhere insideRendering
.Also, requiring these builders to be instantiated makes it much harder to replace them with wholly alternative implementations if a user ever wanted to do so.
So now these are classes composed entirely of static methods, and we pass all the necessary arguments directly to their
.call
.This is a simpler interface for us to publicise for these builders. For
config.part_builder
, for example, all it needs to be is any callable with this params signature:Nice and simple.
Finalize class.config in View#initialize
This was the biggest win, and a lot of power in this three-line change!
The most pivotal of the lines is this one:
Here's why it helped: when we finalize a dry-configurable config object, it now also memoizes its internal
#hash
value. This makes the hash much faster to return, and for a finalized config object, that hash is guaranteed not to change, which is why we can memoize it.The reason we're discussion the config's hash here is because those hashes are used as the cache keys every time we
fetch_or_store
a value from hanami-view's own internal cache.Also in this commit I changed a couple of
fetch_or_store
calls inRenderer
to use the wholeconfig
object as part of the cache key, rather than one or more individual values from the config. For example, this:Becomes this:
Now that we've finalized the config and its own hash is memoized, determining these cache keys becomes much faster, and since these cache accesses are in fairly performance-sensitive areas of the renderer, this gives us some significant performance gains.
I also think that finalizing
config
as part of#initialize
is a helpful change from a reliability/correctness perspective. Because all of our view config is at the class-level only, there's no convenient “moment” at the class-level for us to know when the user is done with their configuration. The best we could do is ask the user to callconfig.finalize!
themselves, which is not ergonomic.However, I feel that it's pretty cut and dry that by the time the user initializes their first view, they're done with all configuring. So here we achieve that moment, and from that point forward we can both get the performance gains from the finalized config, but also guarantee that the config will not change after the user has begun creating instances.
Simplify Context and streamline usage
This is the second of the two UX improvements in this PR, and the last detailed technical summary in this PR description, I promise!
This change could have been a separate PR, but I felt it was easier just to include it here since I'm already doing a big overhaul, and the way our
Context
is handled is an intrinsic part ofRendering
.So what we're dealing with here is that the view's provided
context
object needs to be given the currentrendering
so that itsDecoratedAttributes
support can work. What this means is this:With that
decorate
call,some_dep
will be returned wrapped a matching part class, allowing the user to enhancesome_dep
with extra view-specific behaviour that doesn't necessarily belong onsome_dep
's own class.For all of this to work, each
Rendering
needs to make and store a new copy of the context object with itself passed to it.The way we handled it previously was like so:
That is, we provide our own implementation of
Context#initialize
that tries to keep track of all arguments passed to it, so that in#for_rendering
we can provide those same arguments to a new copy of the object along with the givenrendering
.This has worked well enough so far, but it means that our users need to be careful when they provide their own
#initialize
. Our docs already go out of their way to try and make this clear, e.g.However, having to remember to call
super
like this is awkward at best, and an invitation for subtle bugs at worst.In addition, I also realised that the current implementation would completely overlook any state set by users in their
#initialize
that it not provided as an argument, e.g.This made me realise that what we really want to be doing when
Rendering
sets up its own copy of the context is more like adup
than anything else.So now, that's what we do! I've made it so that users no longer have to call
super
in their#initialize
, and introduced a#dup_for_rendering(rendering)
method that dups the context object before providing the rendering.This is probably easiest to understand by looking at the whole class:
So we're doing a few things here:
.new
method, still make it possible for a rendering to be directly injected as arendering:
argument given to the initializer. This is helpful for tests.RenderingMissing
as the default value for rendering, which we were not doing before; now this will give users more immediate feedback in their unit tests if their context object needs a realRendering
provided.#initialize
now does nothing, except make it clear that we expect this to receive keyword arguments only.#initialize_copy
implementation that takes care of duping and assigning all internal state from the source object, which means we'll get a complete copy no matter what has been done prior. This is also a method that users can reasonably override if they need specific behaviour for their context to be duped.#dup_for_rendering
will#dup
as expected, and then directly set the rendering.All of this means our users can now fully own their
#initialize
methods without worrying aboutsuper
, and we get much more reliably copying of the context object.Questions
After all of this, I've got a couple of outstanding questions.
Part decoration; default to on or off?
This PR does not change the default part decoration for view exposures, and my feeling is to leave this on by default, since it's a core aspect of what we offer with hanami-view. In addition, those variations I made on the benchmarks show how the cost of part decoration becomes relatively smaller when views have more work to do in other areas, such as rendering partials.
Defining methods for dry-configurable config accessors?
This was the one other change that gave a performance boost to my initial benchmark, though that became less pronounced
Here's how that change looked: https://github.com/dry-rb/dry-configurable/compare/main...define-config-accessors
My feeling is that this isn't worth doing now, given we've already got a healthy performance boost here. Extra work on dry-configurable would be a distraction at this point, getting in the way of us focusing on finishing the view integration.
This might be a nice area to explore in the future, however :)