Closed bsless closed 1 year ago
If you can come up with an alternative implementation that's faster in most real time cases, and not significantly slower in edge cases, then sure. I'll need benchmarks, though!
Reading carefully the implementation of URLDecoder/decode
looks like it only uses the encoding when it encounters hex characters, so the naive solution should be correct in all cases.
I'll upload detailed benchmarks tomorrow
Please find below the rationale, implementation and benchmark results
(defn- to-charset
^Charset [s]
(if (string? s)
(Charset/forName s)
s))
(def ^Charset utf-8 (to-charset "UTF-8"))
(defn form-decode-str'
([encoded]
(form-decode-str' encoded utf-8))
([^String encoded encoding]
(try
(URLDecoder/decode encoded (to-charset encoding))
(catch Exception _ nil))))
(defn form-decode-str''
([encoded]
(form-decode-str'' encoded utf-8))
([^String encoded encoding]
(try
(if (or
(.contains encoded "+")
(.contains encoded "%"))
(URLDecoder/decode encoded (to-charset encoding))
encoded)
(catch Exception _ nil))))
(def results
(into
[]
(for [n [1 2 4 8 16 32]
:let [s (apply str (repeat n "a"))
s' (str s "+")]]
{:size n
:no-decode0 (cc/quick-benchmark (form-decode-str s) nil)
:no-decode1 (cc/quick-benchmark (form-decode-str' s) nil)
:no-decode2 (cc/quick-benchmark (form-decode-str'' s) nil)
:decode0 (cc/quick-benchmark (form-decode-str s') nil)
:decode1 (cc/quick-benchmark (form-decode-str' s') nil)
:decode2 (cc/quick-benchmark (form-decode-str'' s') nil)})))
Timing is measured in ns:
size | no-decode0 | no-decode1 | no-decode2 | decode0 | decode1 | decode2 |
---|---|---|---|---|---|---|
1 | 10.89 | 9.39 | 5.06 | 18.11 | 16.48 | 18.26 |
2 | 14.13 | 12.69 | 6.41 | 21.44 | 19.84 | 22.76 |
4 | 18.90 | 17.86 | 6.35 | 25.66 | 24.34 | 26.46 |
8 | 28.89 | 27.38 | 6.34 | 38.42 | 33.96 | 40.15 |
16 | 48.19 | 46.60 | 5.94 | 55.97 | 54.17 | 57.04 |
32 | 93.64 | 89.69 | 8.06 | 110.20 | 101.04 | 98.86 |
Overhead between decode1 and decode2 should be only scanning the string before decoding it:
Based on these results, the full implementation (2) looks good to me. What do you think?
pngs rendered with plotly and clerk
Thank you for the detailed analysis. This looks like a good improvement to the existing implementation. Would you be able to open a PR, and link back to this issue?
Absolutely
While we're at it, would you like me to replace all instances where encoding is provided as String to Charset, or would you like that to be a different PR?
In a separate PR, please.
Since the form-decode-str PR could build on the charset PR, would you like the Charset PR first? I can close #38 and split it
Hello, I think there's an opportunity for optimization in
ring.util.codec/form-decode-str
: In cases where a string does not contain the chars+%
it is faster to check if the string contains them than to callform-decode-str
for every string size. The tricky part is thatURLDecoder.decode
can also change the string's encoding, but if there was a way to guarantee the string is already encoded correctly. A middle ground I'm less keen on is to just re-encode the string in the case where it doesn't need to be decoded. It's faster than callingURLDecoder.decode
but not as fast as possible.What do you think?