Performance optimizations

Panthevm commented 1 year ago

Introduction

When I set out to measure the number of requests per second from my server—specifically, those outputting a small HTML string based on hiccup — I was surprised.

Despite the hiccup content being very small, the response using the hiccup->html conversion was discernibly slower than the response with the identical HTML string, showing a disparity of 24k req/sec versus 40k req/sec.

Upon delving into the hiccup source code, I identified several factors that significantly hampered the processing speed.

Benchmarking & Profiling

Initially, I decided to estimate the speed of execution with criterium. On my computer, I obtained the following results:

Case №1

``` clj (def hiccup [:span "hello"]) (criterium.core/bench (hiccup2.core/html hiccup)) ;; Evaluation count : 25789500 in 60 samples of 429825 calls. ;; Execution time mean : 2.330993 µs ;; Execution time std-deviation : 41.090772 ns ;; Execution time lower quantile : 2.303320 µs ( 2.5%) ;; Execution time upper quantile : 2.373499 µs (97.5%) ;; Overhead used : 7.614148 ns ;; Found 6 outliers in 60 samples (10.0000 %) ;; low-severe 1 (1.6667 %) ;; low-mild 4 (6.6667 %) ;; high-mild 1 (1.6667 %) ;; Variance from outliers : 6.2942 % Variance is slightly inflated by outliers ```

Case №2

``` clj (def hiccup [:span#foo.bar.baz {:id "foo" :class "qux" :styles {:a 1 :b 2}} "hello"]) (criterium.core/bench (hiccup2.core/html hiccup)) ;; Evaluation count : 8239680 in 60 samples of 137328 calls. ;; Execution time mean : 7.269213 µs ;; Execution time std-deviation : 145.634319 ns ;; Execution time lower quantile : 7.167641 µs ( 2.5%) ;; Execution time upper quantile : 7.541402 µs (97.5%) ;; Overhead used : 7.614148 ns ;; Found 5 outliers in 60 samples (8.3333 %) ;; low-severe 2 (3.3333 %) ;; low-mild 3 (5.0000 %) ;; Variance from outliers : 9.3512 % Variance is slightly inflated by outliers ```

Case №3

``` clj (def hiccup [:html [:head [:title "Title"]] [:body [:span "foo"] [:span#id "foo"] [:span#id.class1.class2 "foo"] [:span#id.class1.class2 {:id "id"} "foo"] [:span#id.class1.class2 {:class "class3"} "foo"] [:span#id.class1.class2 {:styles {:a 1 :b 2}} "foo"] (for [i [1 2 3 4]] [:span i])]]) (criterium.core/bench (hiccup2.core/html hiccup)) ;; Evaluation count : 1288140 in 60 samples of 21469 calls. ;; Execution time mean : 46.760425 µs ;; Execution time std-deviation : 837.581889 ns ;; Execution time lower quantile : 46.220047 µs ( 2.5%) ;; Execution time upper quantile : 47.416694 µs (97.5%) ;; Overhead used : 7.614148 ns ;; Found 2 outliers in 60 samples (3.3333 %) ;; low-severe 1 (1.6667 %) ;; low-mild 1 (1.6667 %) ;; Variance from outliers : 7.7697 % Variance is slightly inflated by outliers ```

After that I decided to do the profiling using clj-async-profiler. I discovered numerous calls where there was a cast to clojure.lang.RT.LazySeq.seq and a call to clojure.lang.RestFn.invoke. These calls are highlighted in dark pink in the picture.

Profiling report

Optimization Changes

Removed unnecessary LazySeq.seq and RestFn.invoke calls
- hiccup.util/as-str [& xs] -> hiccup.util/to-str [x]
- clojure.core/str [x & ys] -> hiccup.util/build-string [& strs] (macro)
- map + apply str -> StringBuilder + hiccup.util/fast-run
- hiccup.compiler#re-matches -> hiccup.compiler/parse-tag
Optimized node tag parsing
- hiccup.compiler#re-matches -> hiccup.compiler/parse-tag
Skip attribute processing when absent

Results

Case №1 (2.4µs -> 600ns)

``` clj (def hiccup [:span "hello"]) (criterium.core/bench (hiccup2.core/html hiccup)) ;; Evaluation count : 100645620 in 60 samples of 1677427 calls. ;; Execution time mean : 598.744611 ns ;; Execution time std-deviation : 11.053241 ns ;; Execution time lower quantile : 566.805045 ns ( 2.5%) ;; Execution time upper quantile : 614.233694 ns (97.5%) ;; Overhead used : 8.220649 ns ;; Found 8 outliers in 60 samples (13.3333 %) ;; low-severe 4 (6.6667 %) ;; low-mild 2 (3.3333 %) ;; high-mild 2 (3.3333 %) ;; Variance from outliers : 7.8027 % Variance is slightly inflated by outliers ```

Case №2 (7.3 µs -> 2.9 µs)

``` clj (def hiccup [:span#foo.bar.baz {:id "foo" :class "qux" :styles {:a 1 :b 2}} "hello"]) (criterium.core/bench (hiccup2.core/html hiccup)) ;; Evaluation count : 20125980 in 60 samples of 335433 calls. ;; Execution time mean : 2.942334 µs ;; Execution time std-deviation : 26.560566 ns ;; Execution time lower quantile : 2.911728 µs ( 2.5%) ;; Execution time upper quantile : 3.007485 µs (97.5%) ;; Overhead used : 8.220649 ns ;; Found 5 outliers in 60 samples (8.3333 %) ;; low-severe 5 (8.3333 %) ;; Variance from outliers : 1.6389 % Variance is slightly inflated by outliers ```

Case №3 (46.8µs -> 18 µs)

``` clj (def hiccup [:html [:head [:title "Title"]] [:body [:span "foo"] [:span#id "foo"] [:span#id.class1.class2 "foo"] [:span#id.class1.class2 {:id "id"} "foo"] [:span#id.class1.class2 {:class "class3"} "foo"] [:span#id.class1.class2 {:styles {:a 1 :b 2}} "foo"] (for [i [1 2 3 4]] [:span i])]]) (criterium.core/bench (hiccup2.core/html hiccup)) ;; Evaluation count : 3370440 in 60 samples of 56174 calls. ;; Execution time mean : 18.236171 µs ;; Execution time std-deviation : 383.229484 ns ;; Execution time lower quantile : 17.635168 µs ( 2.5%) ;; Execution time upper quantile : 18.784032 µs (97.5%) ;; Overhead used : 8.220649 ns ;; Found 1 outliers in 60 samples (1.6667 %) ;; low-severe 1 (1.6667 %) ;; Variance from outliers : 9.4103 % Variance is slightly inflated by outliers ```

Profiling report

Questions

Why are we sorting the attributes and the style attribute? Is this just for test output?
Why do we escaping HTML text/attributes by default? Isn't this a complex operation?

weavejester commented 1 year ago

Thanks for the PR. Those are some very helpful optimizations. To answer your questions:

Why are we sorting the attributes and the style attribute? Is this just for test output?

It's sorted to produce a consistent output, so a particular input will always produce the same output. If this weren't the case, the attribute order could change, as map keys are unordered.

Why do we escaping HTML text/attributes by default? Isn't this a complex operation?

For safety. The consequences of client data being unescaped is significant enough that it's worth erring on the side of security over speed in this case.

Panthevm commented 1 year ago

Thanks for the answers! @weavejester

For safety. The consequences of client data being unescaped is significant enough that it's worth erring on the side of security over speed in this case.

There is an option called escape-strings?, but it isn't considered when generating HTML attribute values. Was this intentional?

I think there should be a call to the hiccup.compiler/escape-html function there

https://github.com/weavejester/hiccup/blob/5318b5ea73bffa76f03a4b3add184f50f9b9dd8a/src/hiccup/compiler.clj#L40-L41

weavejester commented 1 year ago

There is an option called escape-strings?, but it isn't considered when generating HTML attribute values. Was this intentional?

Yes. This is because there's no reason not to escape attribute values, whereas strings in tags may potentially contain other elements.

For example:

[:div "<em>Hello</em>"]

Might be something you wouldn't want to escape. But:

[:div {:data-example "<em>Hello</em>"}]

In this case, there's no instance where the attribute value should not be escaped.

Panthevm commented 1 year ago

Yes. This is because there's no reason not to escape attribute values, whereas strings in tags may potentially contain other elements.

Escape strings are time-consuming. I'd like the ability to disable them and manually escape where needed

It's possible to turn them off everywhere, except in attribute values, which seems odd

I expect *escape-strings?* option completely disable escaping

 (def value "SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS")
 (criterium.core/quick-bench (hiccup.util/escape-html value))

;;  Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
;; Evaluation count : 5124636 in 6 samples of 854106 calls.
;;              Execution time mean : 110.759692 ns
;;     Execution time std-deviation : 1.323253 ns
;;    Execution time lower quantile : 109.373272 ns ( 2.5%)
;;    Execution time upper quantile : 112.215336 ns (97.5%)
;;                    Overhead used : 7.985989 ns

weavejester commented 1 year ago

We could add a new option, something like disable-all-escaping?, that would also disable escaping in attribute values. We can't change the existing behavior because existing users may be relying on it.

However, that would be a separate PR.

Panthevm commented 1 year ago

I found another unnecessary seq call in hiccup.compiler/normalize-element

Profiling report:

1247716c3b72eef282e847b901baf228715df597

Panthevm commented 1 year ago

It's sorted to produce a consistent output, so a particular input will always produce the same output. If this weren't the case, the attribute order could change, as map keys are unordered.

What about this optimization? Sort only when order is not guaranteed

diff --git a/src/hiccup/compiler.clj b/src/hiccup/compiler.clj
index 43adcfc..f265bbe 100644
--- a/src/hiccup/compiler.clj
+++ b/src/hiccup/compiler.clj
@@ -43,7 +43,9 @@
        (.append sb ":")
        (.append sb (util/to-str v))
        (.append sb ";"))
-     (sort-by #(name (key %)) value))
+     (if (instance? clojure.lang.PersistentArrayMap value)
+       value
+       (sort-by #(name (key %)) value)))
     (.toString sb)))

 (defn- render-attr-value [value]
@@ -77,7 +79,9 @@
     ""
     (let [^StringBuilder sb (StringBuilder.)]
       (iterate! #(.append sb (render-attribute %))
-                (sort-by #(name (key %)) attrs))
+                (if (instance? clojure.lang.PersistentArrayMap attrs)
+                  attrs
+                  (sort-by #(name (key %)) attrs)))
       (.toString sb))))

 (def ^{:doc "A list of elements that must be rendered without a closing tag."

Benchmarks

Before

``` clj (def hiccup [:html [:head [:title "Title"]] [:body [:span "foo"] [:span#id "foo"] [:span#id.class1.class2 "foo"] [:span#id.class1.class2 {:id "id"} "foo"] [:span#id.class1.class2 {:class "class3"} "foo"] [:span#id.class1.class2 {:styles {:a 1 :b 2}} "foo"] (for [i [1 2 3 4]] [:span i])]]) (criterium.core/bench (hiccup2.core/html hiccup)) ;; Evaluation count : 3157980 in 60 samples of 52633 calls. ;; Execution time mean : 18.995492 µs ;; Execution time std-deviation : 434.056573 ns ;; Execution time lower quantile : 18.788194 µs ( 2.5%) ;; Execution time upper quantile : 19.519848 µs (97.5%) ;; Overhead used : 8.242091 ns ;; Found 4 outliers in 60 samples (6.6667 %) ;; low-severe 1 (1.6667 %) ;; low-mild 3 (5.0000 %) ;; Variance from outliers : 10.9878 % Variance is moderately inflated by outliers ```

After

``` clj (def hiccup [:html [:head [:title "Title"]] [:body [:span "foo"] [:span#id "foo"] [:span#id.class1.class2 "foo"] [:span#id.class1.class2 {:id "id"} "foo"] [:span#id.class1.class2 {:class "class3"} "foo"] [:span#id.class1.class2 {:styles {:a 1 :b 2}} "foo"] (for [i [1 2 3 4]] [:span i])]]) (criterium.core/bench (hiccup2.core/html hiccup)) ;; Evaluation count : 3908700 in 60 samples of 65145 calls. ;; Execution time mean : 15.321174 µs ;; Execution time std-deviation : 116.528715 ns ;; Execution time lower quantile : 15.154524 µs ( 2.5%) ;; Execution time upper quantile : 15.545923 µs (97.5%) ;; Overhead used : 8.242091 ns ;; Found 2 outliers in 60 samples (3.3333 %) ;; low-severe 1 (1.6667 %) ;; low-mild 1 (1.6667 %) ;; Variance from outliers : 1.6389 % Variance is slightly inflated by outliers ```

weavejester commented 1 year ago

The order is never guaranteed for a map. {:a 1, :b 2} and {:b 2, :a 1} are array maps with different key orderings internally, but are considered equal, i.e. (= {:a 1, :b 2} {:b 2, :a 1}).

It's important to ensure that (= (html x) (html y)) if (= x y).

Panthevm commented 1 year ago

It's important to ensure that (= (html x) (html y)) if (= x y).

In that case, it would be logical to provide an option to disable the sorting feature, as it isn't critical for the browser.

weavejester commented 1 year ago

Maybe; that would be another PR though. I'm cautious of any option that makes functions less secure or predictable, even if it's off by default. Checking that option may also incur some small performance hit.

weavejester commented 1 year ago

Otherwise this looks fine to merge once the commits have been squashed down.

Panthevm commented 1 year ago

We can still use new functions when compile-form. This will give a performance gain. But this is already micro-optimization

(def value "hello")

(criterium.core/bench
  (hiccup2.core/html
    [:html
     [:head
      [:title "Title"]]
     [:body
      [:span "foo"]
      [:span#id "foo"]
      value
      [:span#id.class1.class2 "foo"]
      [:span#id.class1.class2 {:id "id"} "foo"]
      [:span#id.class1.class2 {:class "class3"} "foo"]
      [:span#id.class1.class2 {:styles {:a 1 :b 2}} "foo"]]]))

;;;; Before
;;
;; Evaluation count : 86879760 in 60 samples of 1447996 calls.
;;              Execution time mean : 683.640477 ns
;;     Execution time std-deviation : 11.681363 ns
;;    Execution time lower quantile : 674.248617 ns ( 2.5%)
;;    Execution time upper quantile : 700.726863 ns (97.5%)
;;                    Overhead used : 8.340812 ns

;; Found 3 outliers in 60 samples (5.0000 %)
;;  low-severe   2 (3.3333 %)
;;  low-mild     1 (1.6667 %)
;;  Variance from outliers : 6.2702 % Variance is slightly inflated by outliers

;;;; After patch
;;
;; Evaluation count : 185077020 in 60 samples of 3084617 calls.
;;              Execution time mean : 312.203067 ns
;;     Execution time std-deviation : 7.959083 ns
;;    Execution time lower quantile : 306.728104 ns ( 2.5%)
;;    Execution time upper quantile : 325.430932 ns (97.5%)
;;                    Overhead used : 8.340812 ns

;; Found 5 outliers in 60 samples (8.3333 %)
;;  low-severe   3 (5.0000 %)
;;  low-mild     2 (3.3333 %)
;;  Variance from outliers : 12.6210 % Variance is moderately inflated by outliers

Patch

``` patch diff --git a/src/hiccup/compiler.clj b/src/hiccup/compiler.clj index d8bf187..1367209 100644 --- a/src/hiccup/compiler.clj +++ b/src/hiccup/compiler.clj @@ -23,13 +23,13 @@ (defn- end-tag [] (if (xml-mode?) " />" ">")) -(defn- iterate! [callback coll] +(defn iterate! [callback coll] (when coll (let [^Iterator iterator (.iterator ^Iterable coll)] (while (.hasNext iterator) (callback (.next iterator)))))) -(defmacro ^:private build-string [& strs] +(defmacro build-string [& strs] (let [w (gensym)] `(let [~w (StringBuilder.)] ~@(map (fn [arg] `(.append ~w ~arg)) strs) @@ -222,7 +222,12 @@ (defmethod compile-form "for" [[_ bindings body]] - `(apply str (for ~bindings ~(compile-html body)))) + `(let [^StringBuilder sb# (StringBuilder.)] + (iterate! #(.append sb# %) + (for ~bindings ~(compile-html body))) + (.toString sb#)) + ;;`(apply str (for ~bindings ~(compile-html body))) + ) (defmethod compile-form "if" [[_ condition & body]] @@ -301,10 +306,10 @@ [[tag attrs & content]] (let [[tag attrs _] (normalize-element-form [tag attrs])] (if (container-tag? tag content) - `(str ~(str "<" tag) ~(compile-attr-map attrs) ">" - ~@(compile-seq content) - ~(str "")) - `(str "<" ~tag ~(compile-attr-map attrs) ~(end-tag))))) + `(build-string ~(build-string "<" tag) ~(compile-attr-map attrs) ">" + ~@(compile-seq content) + ~(build-string "")) + `(build-string "<" ~tag ~(compile-attr-map attrs) ~(end-tag))))) (defmethod compile-element ::literal-tag-and-no-attributes [[tag & content]] @@ -317,18 +322,18 @@ `(let [~attrs-sym ~attrs] (if (map? ~attrs-sym) ~(if (container-tag? tag content) - `(str ~(str "<" tag) - (render-attr-map (merge ~tag-attrs ~attrs-sym)) ">" - ~@(compile-seq content) - ~(str "")) - `(str ~(str "<" tag) - (render-attr-map (merge ~tag-attrs ~attrs-sym)) - ~(end-tag))) + `(build-string ~(build-string "<" tag) + (render-attr-map (merge ~tag-attrs ~attrs-sym)) ">" + ~@(compile-seq content) + ~(build-string "")) + `(build-string ~(build-string "<" tag) + (render-attr-map (merge ~tag-attrs ~attrs-sym)) + ~(end-tag))) ~(if (container-tag? tag attrs) - `(str ~(str "<" tag (render-attr-map tag-attrs) ">") - ~@(compile-seq (cons attrs-sym content)) - ~(str "")) - (str "<" tag (render-attr-map tag-attrs) (end-tag))))))) + `(build-string ~(build-string "<" tag (render-attr-map tag-attrs) ">") + ~@(compile-seq (cons attrs-sym content)) + ~(build-string "")) + (build-string "<" tag (render-attr-map tag-attrs) (end-tag))))))) (defmethod compile-element :default [element] @@ -361,7 +366,9 @@ (cons (first expr) (mapcat - #(if (and (seq? %) (symbol? (first %)) (= (first %) (first expr) `str)) + #(if (and (seq? %) (symbol? (first %)) + (or (= (first %) (first expr) `str) + (= (first %) (first expr) `build-string))) (rest (collapse-strs %)) (list (collapse-strs %))) (rest expr))) @@ -370,7 +377,7 @@ (defn compile-html "Pre-compile data structures into HTML where possible." [& content] - (collapse-strs `(str ~@(compile-seq content)))) + (collapse-strs `(build-string ~@(compile-seq content)))) (defn- binding* [var val func] (push-thread-bindings {var val}) ```

weavejester commented 1 year ago

Thanks. Can you remove the extra lines in the commit message? They no longer have relevance. i.e. the commit message should just be:

Improve compiler performance

Panthevm commented 1 year ago

Thanks. Can you remove the extra lines in the commit message? They no longer have relevance. i.e. the commit message should just be:

Done 👌

weavejester commented 1 year ago

I see you snuck in a few more changes! Let me know when you're done and I can review.

Panthevm commented 1 year ago

We can still use new functions when compile-form. This will give a performance gain. But this is already micro-optimization

I see you snuck in a few more changes! Let me know when you're done and I can review

Realized it. Should I squash it?

We may not add them

Panthevm commented 1 year ago

I apologize if I'm doing something wrong. This is my first pull request 👽

weavejester commented 1 year ago

You're doing fine! I just need to know when you're ready. I'll do a quick review, then once everything's fine the commits can be squashed down. I prefer PRs to have a readable commit history, as often they rack up commits with lots of minor changes that make the master commit log harder to read if they're merged without squashing them a little.

Panthevm commented 1 year ago

Done 💅

weavejester commented 1 year ago

Merged! Thanks for the patch. I'll release 2.0.0-RC2 shortly.

Panthevm commented 1 year ago

Merged! Thanks for the patch. I'll release 2.0.0-RC2 shortly.

Awesome! 🎉

Thank you for the help. We've made the internet a tad faster ;)

Final performance comparison: https://github.com/Panthevm/hiccup-benchmark

weavejester / hiccup