Refactor, simplify and improve rendering performance

hanami-view has been close to functionally complete for a while, but its turned out to be rather slow. We needed to tune its performance before we consider this ready for release. I've done that here in this PR, as part of an overall refactoring which also sees the internal architecture become simpler, and several user-facing parts become easier to use.

There will undoubtedly be other areas we can explore for further performance gains, but I'm happy enough with what we've achieved so far, and after this PR I'm ready to move back onto working on completing the rest of the integration with the full Hanami framework.

The numbers

Let's start with the performance numbers, since these were the prime motivating factor.

First, our baseline. On the main branch, running benchmarks/comparative_benchmark.rb (which a calls a view taking a 2-element array as an exposure and then renders a template and partial within a layout) gives the following result, nearly 2x slower than Rails:

               rails:    17368.4 i/s
         hanami/view:     9117.3 i/s - 1.90x  (± 0.00) slower

On this branch, we now have this result, over 12k more i/s, making for a 2.4x overall rendering performance improvement, which also puts us in front of Rails:

         hanami/view:    21783.7 i/s
               rails:    17234.2 i/s - 1.26x  (± 0.00) slower

This PR gets us to this point, but there are further gains to be found via a couple of extra measures. Firstly, if we add the decorate: false to the exposure used on the view class, we get another nearly 9k i/s. This reveals the cost of part decoration on a simple view (more on this later):

         hanami/view:    30543.3 i/s
               rails:    17668.7 i/s - 1.73x  (± 0.00) slower

Alternatively, if we use a branch of dry-configurable that defines singleton setting accessor methods on config objects when they're finalized, we see a 5k i/s improvement over the current state of this branch:

         hanami/view:    26427.8 i/s
               rails:    17605.6 i/s - 1.50x  (± 0.00) slower

And if we combine both those measures, we're getting close to doubling the already improved rendering performance currently on this branch.

         hanami/view:    39037.8 i/s
               rails:    17429.7 i/s - 2.24x  (± 0.00) slower

I'll leave discussion of these additional performance gains until the end, and in the meantime, let me take you through the code changes that have already yielded the original 2.4x improvement.

A note on the benchmarks

While I've focused on the performance of a single benchmark as the main measure for the work in this PR, it's important we remember to take these figures with a grain of salt.

Every view rendering workload is different and performance will vary accordingly.

For example, if I turned this into more of an "index" screen and changed the single exposure to be an array of 40 elements, we we numbers more like this:

         hanami/view:     3396.2 i/s
               rails:     2258.0 i/s - 1.50x  (± 0.00) slower

And with decorate: false:

         hanami/view:     3934.1 i/s
               rails:     2322.6 i/s - 1.69x  (± 0.00) slower

See how that now is an improvement of only 1.15x (from 3396 to 3934 i/s), whereas with our smaller array it resulted in an improvement of 1.4x. This is because with the larger array we're spending comparatively more time inside Tilt rendering partials, which lessens the overall impact of the part decoration.

And if I run this 40-element benchmark against that dry-configurable branch, we see the following:

         hanami/view:     3432.5 i/s
               rails:     2298.8 i/s - 1.49x  (± 0.00) slower

This is now an extremely marginal gain of just 1.01x versus the 1.2x gain that we saw when we used the smaller array.

Overview of changes

I've cleaned up the history of this branch so each commit makes sense on its own. I'd encourage reviews to look at this commit by commit. But before doing this, take note of which commits I've explained in more detail below so those sections can serve as your guide 😊

The commits fall into three categories:

Minor performance improvements 🤏
Major performance improvements 🖐️
User experience improvements 🙌

Here are key commits along with their impact on the benchmark. Take all of these with a grain of salt, too; there are definitely variations in the numbers on each benchmark run, but at very least, this should hopefully show that each changes takes us in a positive direction.

Commit	i/s	Delta
(main)	`9117`	`0`
Simplify name_for_partial 🤏	`9239`	`+122`
Move ensure_config to initialize 🤏	`9331`	`+92`
Only tsort when we need to 🤏	`9500`	`+169`
Mutate hash when removing private exposures 🤏	`9566`	`+66`
Replace render environments with single Rendering 🖐️ 🙌	`16041`	`+6475`
Finalize class.config in View#initialize 🖐️	`19712`	`+3671`
Use single cache 🤏	`20428`	`+716`
Avoid allocating extra hashes for Tilt options 🤏	`20673`	`+245`
Stop accepting arbitrary options in PartBuilder 🤏	`21161`	`+488`
Remove Dry::Equalizer where it's not unnecessary 🤏	`21582`	`+421`
Make PartBuilder and ScopeBuilder stateless 🙌	`21398`	`-184`
Simplify Context and streamline usage 🙌	`21503`	`+105`

Most of these commits are fairly self-explanatory via their diff and the commit message I've left, but there are a few more significant ones that I'll go through in more detail below.

Replace render environments with single Rendering

Based on the numbers above this, this is one of two big performance boosts we get. The internal changes here are also significant, with user experience ramifications too, so I want to take the time to go through them here.

The first big change here is that we removed the ability of Renderer to chdir itself to different directories. This was the main way we looked up templates to render before. We would create three different renderer objects for each View#call, chdir’ed into different places: the root, the layouts dir, and a dir matching the template's name.

To boot, we would also do that via also create three corresponding RenderEnvironment instances, which were themselves responsible for managing the renderer:

def call(format: config.default_format, context: config.default_context, **input)
  env = self.class.render_env(format: format, context: context) # <================= 1️⃣
  template_env = self.class.template_env(format: format, context: context) # <====== 2️⃣

  locals = locals(template_env, input)
  output = env.template(config.template, template_env.scope(config.scope, locals))

  if layout?
    layout_env = self.class.layout_env(format: format, context: context) # <======== 3️⃣
    begin
      output = env.template(
        self.class.layout_path,
        layout_env.scope(config.scope, layout_locals(locals))
      ) { output }
    rescue TemplateNotFoundError
      raise LayoutNotFoundError.new(config.layout, config.paths)
    end
  end

  Rendered.new(output: output, locals: locals)
end

With this change, we've replaced RenderEnvironment with Rendering. The Rendering represents just that: the rendering that happens each time you #call a given view.

The Rendering is very much equivalent to our previous RenderEnvironment: it provides #template, #partial, #part and #scope methods that render templates or build parts or scopes based on the view's configuration, and we pass this rendering object to any place that needs to ability to render templates or build parts, like the part objects themselves.

However, what we've done is change the way the internal renderer works, which allows us to create just a single Rendering and a single Renderer, instead of the 6 different objects we were creating before.

def call(format: config.default_format, context: config.default_context, **input)
  rendering = self.rendering(format: format, context: context) # <================= 1️⃣ and 1️⃣ only!

  locals = locals(rendering, input)
  output = rendering.template(config.template, rendering.scope(config.scope, locals))

  if layout?
    begin
      output = rendering.template(
        self.class.layout_path,
        rendering.scope(config.scope, layout_locals(locals))
      ) { output }
    rescue TemplateNotFoundError
      raise LayoutNotFoundError.new(config.layout, config.paths)
    end
  end

  Rendered.new(output: output, locals: locals)
end

The way we achieved this was to change the way Renderer works when looking up templates.

Previously, the renderer would start in its current dir (wherever it had been chdir’ed to, per the arrangement with the RenderEnvironment instances above), then search for templates in that current dir well as in a (hard-coded) shared/ subdirectory, before recursing upwards through the directory tree until it either found a template or reached the root:

# in Renderer
def lookup(name, **options)
  paths.inject(nil) { |_, path|
    result = path.lookup(name, format, **options)
    break result if result
  }
end

# in Path
def lookup(name, format, child_dirs: [], parent_dir: false)
  fetch_or_store(dir, root, name, format, child_dirs, parent_dir) do
    lookup_template(name, format) ||
      lookup_in_child_dirs(name, format, child_dirs: child_dirs) ||
      parent_dir && lookup_in_parent_dir(name, format, child_dirs: child_dirs)
  end
end

def lookup_in_parent_dir(name, format, child_dirs:)
  !root? && chdir("..").lookup(name, format, child_dirs: child_dirs, parent_dir: true)
end

This was both quirky (with the way that shared/ subdirectory worked being quite inflexible as well as unusual for a template-based view system) as well as inefficient: this is why we had to make those 3 different render environments, and internally it required us to create new Path instances for every directory traversal, as we can see with chdir("..").lookup(...) the #lookup_in_parent_dir method above.

To improve this arrangement, and to support us having that single Rendering instance above, we've changed Renderer so we can have just a single instance across the lifecycle of the Rendering. To do this, we've made the renderer stateful, changed how it looks up templates.

Renderer is now stateful insofar as it maintains an @prefixes array, which are subdirectory paths that it adds to the configured paths when it searches for templates. It defaults to ["."], and then whenever Renderer#template is called, it adds the dirname of the given template name onto the prefixes array. This means that any partials rendered within the template will first be looked up relative to that template's own directory before falling back to being looked up at subdirectories matching any of the previous prefixes.

Appending to the prefixes array works also for nested partials, so given a call to renderer.template("users/index"), the @prefixes will be set to [".", "users"]. If inside that template we do a render("common/user_info_box"), then the @prefixes will be set to [".", "users", "users/common"] before that partial is rendered, and so on.

The code to do this in Renderer is pretty straightforward (and much more straightforward compared to what we had before):

def template(name, format, scope, &block)
  old_prefixes = @prefixes.dup

  template_path = lookup(name, format)

  raise TemplateNotFoundError.new(name, format, config.paths) unless template_path

  new_prefix = File.dirname(name)
  @prefixes << new_prefix unless @prefixes.include?(new_prefix)

  render(template_path, scope, &block)
ensure
  @prefixes = old_prefixes
end

In effect, this all means that templates will now be looked up relative to the most recently rendered template's own directory, using ordinary relative path mechanics. This is a significant usability improvement, because I believe it's what users expect from a template-based view system; we've moved firmly back to the path of least surprise here, which is a good thing given we're already introducing so many other new concepts in hanami-view.

And all of this, it seems, makes for a decent improvement to overall rendering performance, AFAICT from the reduction in object creation.

It also makes our rendering code much easier to follow, so this feels like a win-win-win situation!

A note on naming

So the reason I introduced Rendering as a name rather than keep RenderEnvironment is that I think Rendering feels like it better conveys what's happening — this is an object that represents the state and facilities available as of the current rendering — the -ing suffix makes it feel both more active and transient.

I realise this does mean we have the following three names:

Rendering, the new class just introduced here
Renderer, concerned specifically with finding and rendering the templates themselves
Rendered, a #to_str-convertible object holding the result of the rendering

However, since Renderer and Rendering are mostly internal, and Rendered is never physically typed by our users, I feel reasonably comfortable with these names even given their overlap. I did at least want to bring this up in case any of you had thoughts 😄

Make PartBuilder and ScopeBuilder stateless

One of the things that concerned my while I worked on the above refactoring is that we had to make new part and scope builder instances for every rendering:

@part_builder = config.part_builder.new(namespace: config.part_namespace, rendering: self)
@scope_builder = config.scope_builder.new(namespace: config.scope_namespace, rendering: self)

This seemed wasteful given we could also just pass the namespace and self when we actually invoke PartBuilder#call and ScopeBuilder#call elsewhere inside Rendering.

Also, requiring these builders to be instantiated makes it much harder to replace them with wholly alternative implementations if a user ever wanted to do so.

So now these are classes composed entirely of static methods, and we pass all the necessary arguments directly to their .call.

This is a simpler interface for us to publicise for these builders. For config.part_builder, for example, all it needs to be is any callable with this params signature:

call(name, value, as: nil, rendering:)

Nice and simple.

Finalize class.config in View#initialize

This was the biggest win, and a lot of power in this three-line change!

The most pivotal of the lines is this one:

--- a/lib/hanami/view.rb
+++ b/lib/hanami/view.rb
@@ -471,6 +471,7 @@ module Hanami
     #
     # @api public
     def initialize
+      self.class.config.finalize!
       ensure_config

       @exposures = self.class.exposures.bind(self)

Here's why it helped: when we finalize a dry-configurable config object, it now also memoizes its internal #hash value. This makes the hash much faster to return, and for a finalized config object, that hash is guaranteed not to change, which is why we can memoize it.

The reason we're discussion the config's hash here is because those hashes are used as the cache keys every time we fetch_or_store a value from hanami-view's own internal cache.

Also in this commit I changed a couple of fetch_or_store calls in Renderer to use the whole config object as part of the cache key, rather than one or more individual values from the config. For example, this:

fetch_or_store(:engine, path, config.renderer_engine_mapping, config.renderer_options) {

Becomes this:

fetch_or_store(:engine, path, config) {

Now that we've finalized the config and its own hash is memoized, determining these cache keys becomes much faster, and since these cache accesses are in fairly performance-sensitive areas of the renderer, this gives us some significant performance gains.

I also think that finalizing config as part of #initialize is a helpful change from a reliability/correctness perspective. Because all of our view config is at the class-level only, there's no convenient “moment” at the class-level for us to know when the user is done with their configuration. The best we could do is ask the user to call config.finalize! themselves, which is not ergonomic.

However, I feel that it's pretty cut and dry that by the time the user initializes their first view, they're done with all configuring. So here we achieve that moment, and from that point forward we can both get the performance gains from the finalized config, but also guarantee that the config will not change after the user has begun creating instances.

Simplify Context and streamline usage

This is the second of the two UX improvements in this PR, and the last detailed technical summary in this PR description, I promise!

This change could have been a separate PR, but I felt it was easier just to include it here since I'm already doing a big overhaul, and the way our Context is handled is an intrinsic part of Rendering.

So what we're dealing with here is that the view's provided context object needs to be given the current rendering so that its DecoratedAttributes support can work. What this means is this:

class MyContext < Hanami::View::Context
  attr_reader :some_dep

  decorate :some_dep
end

With that decorate call, some_dep will be returned wrapped a matching part class, allowing the user to enhance some_dep with extra view-specific behaviour that doesn't necessarily belong on some_dep's own class.

For all of this to work, each Rendering needs to make and store a new copy of the context object with itself passed to it.

The way we handled it previously was like so:

def initialize(rendering: nil, **options)
  @_rendering = rendering
  @_options = options
end

# @api private
def for_rendering(rendering)
  return self if rendering == _rendering

  self.class.new(**_options.merge(rendering: rendering))
end

That is, we provide our own implementation of Context#initialize that tries to keep track of all arguments passed to it, so that in #for_rendering we can provide those same arguments to a new copy of the object along with the given rendering.

This has worked well enough so far, but it means that our users need to be careful when they provide their own #initialize. Our docs already go out of their way to try and make this clear, e.g.

# In subclasses, you should include an `**options` parameter and pass _all
# arguments_ to `super`. This allows Context to make copies of itself
# while preserving your dependencies.
#
# @example
#   class MyContext < Hanami::View::Context
#     # Injected dependency
#     attr_reader :assets
#
#     def initialize(assets:, **options)
#       @assets = assets
#       super
#     end
#   end

However, having to remember to call super like this is awkward at best, and an invitation for subtle bugs at worst.

In addition, I also realised that the current implementation would completely overlook any state set by users in their #initialize that it not provided as an argument, e.g.

class MyContext < Hanami::View::Context
  def initialize
    @foo = "foo" 
  end

  def something_that_updates_foo
    # If this new value is assigned, it will be be completely 
    # overlooked when #for_rendering makes a new object
    @foo = "bar"
  end
end

This made me realise that what we really want to be doing when Rendering sets up its own copy of the context is more like a dup than anything else.

So now, that's what we do! I've made it so that users no longer have to call super in their #initialize, and introduced a #dup_for_rendering(rendering) method that dups the context object before providing the rendering.

This is probably easiest to understand by looking at the whole class:

class Context
  include DecoratedAttributes

  # @api private
  attr_reader :_rendering

  # @api private
  def self.new(rendering: RenderingMissing.new, **args)
    allocate.tap do |obj|
      obj.instance_variable_set(:@_rendering, rendering)
      obj.send(:initialize, **args)
    end
  end

  # Returns a new instance of Context
  #
  # @api public
  def initialize(**)
  end

  # @api private
  private def initialize_copy(source)
    ivars = source.instance_variables - [:@_rendering]
    ivars.each do |ivar|
      instance_variable_set(ivar, source.instance_variable_get(ivar).dup)
    end
  end

  # @api private
  def dup_for_rendering(rendering)
    dup.tap do |obj|
      obj.instance_variable_set(:@_rendering, rendering)
    end
  end
end

So we're doing a few things here:

With .new method, still make it possible for a rendering to be directly injected as a rendering: argument given to the initializer. This is helpful for tests.
Provide our RenderingMissing as the default value for rendering, which we were not doing before; now this will give users more immediate feedback in their unit tests if their context object needs a real Rendering provided.
Our default #initialize now does nothing, except make it clear that we expect this to receive keyword arguments only.
Provide a default #initialize_copy implementation that takes care of duping and assigning all internal state from the source object, which means we'll get a complete copy no matter what has been done prior. This is also a method that users can reasonably override if they need specific behaviour for their context to be duped.
Lastly, #dup_for_rendering will #dup as expected, and then directly set the rendering.

All of this means our users can now fully own their #initialize methods without worrying about super, and we get much more reliably copying of the context object.

Questions

After all of this, I've got a couple of outstanding questions.

Part decoration; default to on or off?

This PR does not change the default part decoration for view exposures, and my feeling is to leave this on by default, since it's a core aspect of what we offer with hanami-view. In addition, those variations I made on the benchmarks show how the cost of part decoration becomes relatively smaller when views have more work to do in other areas, such as rendering partials.

Defining methods for dry-configurable config accessors?

This was the one other change that gave a performance boost to my initial benchmark, though that became less pronounced

Here's how that change looked: https://github.com/dry-rb/dry-configurable/compare/main...define-config-accessors

My feeling is that this isn't worth doing now, given we've already got a healthy performance boost here. Extra work on dry-configurable would be a distraction at this point, getting in the way of us focusing on finishing the view integration.

This might be a nice area to explore in the future, however :)

hanami / view