metosin / sieppari

Small, fast, and complete interceptor library for Clojure/Script
Eclipse Public License 2.0
207 stars 21 forks source link

Going faster on the JVM #51

Open bsless opened 3 years ago

bsless commented 3 years ago

I looked into sieppari's performance on the JVM and ran some profiling to get a sense of what exactly needs to be optimized to get on-par with middleware The measurements and results can be found here Generally, CPU utilization can be broken down to:

With native queue and stack and eliminating all keyword access (very ugly code) I was able to get a speed up of 2-3x. The remaining CPU went to the queue and stack mathods, MethodImplCache lookup, Context instantiation (immutable record) and -try. Still very far from just composing 10 functions.

To get that performance we need a "compiled" model which among other things gets rid of the queue and stack and perhaps creates a mutable context for each execution

If I understand the execution model correctly (sans the reified queue) the execution flow for three interceptors would look something like:

flow dot

This can be implemented in two different ways, besides the queue/stack implementation:

(def fs [:errf1 :errf2 :errf3]) (def escapes (into [] (reductions -comp-error fs)))

(defn -comp-leave [[g-l g-e] [f-l f-e]] [(fn leave [ctx] (try (g-l (f-l ctx)) (catch Exception e (g-e (assoc ctx :error e))))) f-e])

(def fs [:leave1 :leave2 :leave3]) (def leaves (into [] (map first) (reductions -comp-leave (map vector fs escapes))))

(defn -comp-enter [tot [f err]] (fn [ctx] (try (f (tot ctx)) (catch Exception e (err (assoc ctx :error e))))))

(def es [:enter1 :enter2 :enter3]) (def doors (map vector es escapes)) (def enters (reduce -comp-enter identity doors))

(let [farewell (last leaves)] (def fin (fn [ctx] (farewell (enters ctx)))))



Regarding async, #9 suggests it might be handled incorrectly at the moment. An option to consider is choosing a unified execution model and doing everything in it. Or should it be determined by the executor? 

We should also figure out which parts of the context can be mutable and are the execution environment and which are data. If we forgo the option of exposing the runtime environment (queue and stack) we can go much faster.
There could be a hierarchy of Runner -> ExecutionContext -> Context

There are a lot of considerations and degrees of freedom which can lead to widely different solutions and I'd like to open them for discussion and hopefully experiment with different solutions.

Would love to hear your thoughts and to take a swing at it myself.
zerg000000 commented 3 years ago

queue/stack is needed, since the interceptor could modify the execution queue to provide dynamic behaviour. e.g. routing interceptor is base on the ability of modifying execution queue.

bsless commented 3 years ago

@zerg000000 since the queue/interceptors keys are known at compile time, couldn't the interceptor chains be precompiled just as the router is precompiled? The only case where you'd need a queue is if you injected it programatically at run time. How common is that pattern?

zerg000000 commented 3 years ago

https://github.com/metosin/reitit/blob/1ab075bd353966636f154ac36ae9b7990efeb008/modules/reitit-http/src/reitit/http.cljc#L99

bsless commented 3 years ago

Yes I read that implementation before posting my reply. To reiterate the idea - the chain of interceptors could be derived for each path in advance. This case does not show why it's needed to be modified dynamically. If an endpoint has interceptors associated with it, just "fork" the chain and run them when you match (or after?)

tvaisanen commented 3 years ago

Doing the composition sounds like a good idea to improve the performance. Maybe it could be decoupled from the asynchronous implementation by providing a :async true flag etc.

What comes to the async aspect, reactive streams compliance could be something worthwhile looking into.