ring-clojure / ring-json

Ring middleware for handling JSON
313 stars 47 forks source link

[performance] Remove slurp from read-json #65

Closed bsless closed 3 years ago

bsless commented 3 years ago

slurp can have quite detrimental effects on performance. This example is with jsonista but since both it and cheshire use jackson I expect about the same order of magnitude of effect:

(require '[criterium.core :as cc]
         '[jsonista.core :as json])
(def m {:a 1 :b 2 :c "3" :d ["5"]})
(def mapper (json/object-mapper {:decode-key-fn true}))
(def bs (json/write-value-as-bytes  m))
(type bs) ;; => [B
(defn bytes->str [^"[B" bs] (String. bs))
(bytes->str bs) ;; => "{\"a\":1,\"b\":2,\"c\":\"3\",\"d\":[\"5\"]}"
(cc/bench (json/read-value (slurp bs) mapper))
;;; Execution time mean : 8.653311 µs
(cc/bench (json/read-value (bytes->str bs) mapper))
;;; Execution time mean : 2.068923 µs
(cc/bench (json/read-value bs mapper))
;;; Execution time mean : 1.912441 µs

I don't mind submitting a PR with the fix.

weavejester commented 3 years ago

Sure, as long as it doesn't affect anything else, a performance improvement would be welcome.

bsless commented 3 years ago

I have a working solution, but before I go forward with it, there's an assumption I need to verify regarding the tests, will the body always be a ByteArrayInputStream? If so, the solution is straightforward. Otherwise I'll need to give it some more thought.

weavejester commented 3 years ago

The body will be an InputStream. It's rarely a ByteArrayInputStream specifically outside of testing.

bsless commented 3 years ago

So using a java.io.InputStreamReader will be appropriate, in your opinion?

bsless commented 3 years ago

Besides getting rid of slurp, there's a significant overhead when using regex to determine the content type and encoding. Also overhead with binding *use-bigdecimals?*

attached profiling results (svg) as text file, you can view it properly by changing the extension. Github doesn't support svgs, seems like. 02-cpu-flamegraph-2020-09-26-18-50-50.svg.txt

weavejester commented 3 years ago

there's a significant overhead when using regex to determine the content type and encoding

A significant overhead? The flame graph doesn't seem to have a scale, so could you tell me how significant that is?

Also, what do you propose as the solution? Or was it just an observation?

bsless commented 3 years ago

Right, I should have specified how I created the flame graph:

(let [handler (wrap-json-body identity)
      request  {:headers {"content-type" "application/json; charset=GBK"}
                :body (string-input-stream (String. (.getBytes "{\"foo\": \"你好\"}")) "GBK")}]
  (prof/profile
   (dotimes [_ 1e7]
     (handler request))))

Just ran it on a very minimal example, burning CPU running (wrap-json-body identity). The scale is absolute % of CPU time. In terms of suggestion, I went over to see how Metosin parsed content type in muuntaja:

(defn parse-content-type [^String s]
  (let [i (.indexOf s ";")]
    (if (neg? i)
      [s nil]
      [(.substring s 0 i) (extract-charset (.toLowerCase (.trim (.substring s (inc i)))))])))

I don't know if it's as complete as using a regex. Do you think it's worth pursuing?

EDIT: ran a little benchmark with parse-content-type, same inputs as all instances until now, 2x speedup.

bsless commented 3 years ago

Regarding dynamically binding bigdecimals, edit, I was probably wrong, that's what we get for it being dynamic. might be possible to dispatch to a happy path without binding, though.