k0kubun / hamlit

High Performance Haml Implementation
https://rubygems.org/gems/hamlit
Other
981 stars 59 forks source link

Dynamic Merger Slowing Down Processing #197

Closed AlessandroMinali closed 1 year ago

AlessandroMinali commented 1 year ago

I'm trying to understand the purpose of this code: https://github.com/k0kubun/hamlit/blob/cf7908a0b3621e8a4fef26d1de43ae8ea875f23a/lib/hamlit/dynamic_merger.rb#L17-L19

On pages where the branch content is not executed all, time is still wasted calling count_newline with no useful work done: https://github.com/k0kubun/hamlit/blob/cf7908a0b3621e8a4fef26d1de43ae8ea875f23a/lib/hamlit/dynamic_merger.rb#L55-L66

If I simply comment out this code my page output content does not change.

I am trying to provide an example but I cannot get the benchmark gem to run on my local: https://github.com/benchmark-driver/benchmark-driver/issues/77

I am trying to run this old benchmark that was removed: https://github.com/k0kubun/hamlit/blob/6821f4fe9bc0f7a9312a0cf5008e7abe4911f6d7/benchmark/dynamic_merger/benchmark.rb

Here are "large" files that should showcase the issue and where I noticed the issue(massive slowdown and no useful work done): test.haml.txt test.string.txt

Here are stackprof firegraph traces which can be viewed @ https://www.speedscope.app: current_code.json commented_out_code.json

sinatra (3.1.0)
hamlit (3.0.3)
RUBY VERSION
   ruby 3.0.0p0
k0kubun commented 1 year ago

I am trying to provide an example but I cannot get the benchmark gem to run on my local

Here's an example script that measures the performance of compiling your template and the result:

require 'benchmark_driver'

Benchmark.driver do |x|
  x.prelude <<~RUBY
    require 'haml'
    engine = Haml::Engine.new
    template = File.read("test.haml.txt")
  RUBY

  x.report 'compile', %{ engine.call(template) }
end
Warming up --------------------------------------
             compile     99.781 i/s -     100.000 times in 1.002193s (10.02ms/i)
Calculating -------------------------------------
             compile     99.404 i/s -     299.000 times in 3.007918s (10.06ms/i)

Here are "large" files that should showcase the issue and where I noticed the issue(massive slowdown

Does it matter though? The compilation should happen only once for every template. It might take an extra 10ms in the first request, but it should take 0ms in all subsequent requests. Most template engines optimize all such subsequent requests at the cost of spending time optimizing the code once, so it's all expected.

no useful work done

Different filters serve different templates. It's natural that some filters do nothing for some templates.

So here's an example case that this filter matters:

$ cat /tmp/a.haml
foo#{"bar".freeze}baz

$ hamlit compile --no-escape-html /tmp/a.haml
_buf = (%Q^@foo#{"bar".freeze}baz
^@).to_s

Here, I used --no-escape-html to emphasize the impact of the filter. Note that Hamlit uses a null character for string literals and it's printed as ^@ here. I'll use ! instead in a later benchmark.

Now, let's look at the code that is generated by Haml without DynamicMerger.

$ hamlit compile --no-escape-html /tmp/a.haml
_buf = ''.dup; _buf << ("foo".freeze); _buf << (("bar".freeze).to_s); _buf << ("baz\n".freeze);
; _buf

Let's benchmark them together:

require 'benchmark_driver'

Benchmark.driver do |x|
  x.report 'disabled', <<~'RUBY'
    _buf = ''.dup; _buf << ("foo".freeze); _buf << (("bar".freeze).to_s); _buf << ("baz\n".freeze);
    ; _buf
  RUBY
  x.report 'enabled', <<~'RUBY'
    _buf = (%Q!foo#{"bar".freeze}baz
    !).to_s
  RUBY
end
ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-linux]
Warming up --------------------------------------
            disabled     8.214M i/s -      8.393M times in 1.021838s (121.75ns/i, 120clocks/i)
             enabled    19.461M i/s -     19.813M times in 1.018098s (51.39ns/i, 48clocks/i)
Calculating -------------------------------------
            disabled     8.639M i/s -     24.641M times in 2.852253s (115.75ns/i, 133clocks/i)
             enabled    21.897M i/s -     58.382M times in 2.666269s (45.67ns/i, 63clocks/i)

Comparison:
             enabled:  21896651.0 i/s
            disabled:   8639034.7 i/s - 2.53x  slower

I hope now you understand why this exists and your "slowdown" is an intended, one-time behavior for achieving this speedup in all future requests.

AlessandroMinali commented 1 year ago

Thank you for the explanation.