greglook / clj-cbor

Native Clojure CBOR codec implementation.
The Unlicense
70 stars 7 forks source link

Jump table decoder #11

Closed aria42 closed 5 years ago

aria42 commented 5 years ago

Initial Jump table decoder implementation and tests.

The out-of-the-box implementation is pretty fast (46% faster in decoding time on the benchmark), but there's a lot of room for improvement. Wanted to get feedback before iterating on this. Some tidbits:

codecov-io commented 5 years ago

Codecov Report

Merging #11 into develop will decrease coverage by 1.57%. The diff coverage is 92.3%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop      #11      +/-   ##
===========================================
- Coverage    99.86%   98.28%   -1.58%     
===========================================
  Files           13       14       +1     
  Lines          748      818      +70     
  Branches         1        5       +4     
===========================================
+ Hits           747      804      +57     
- Misses           0        9       +9     
- Partials         1        5       +4
Impacted Files Coverage Δ
src/clj_cbor/data/core.clj 100% <ø> (ø) :arrow_up:
src/clj_cbor/tags/clojure.clj 100% <100%> (ø) :arrow_up:
src/clj_cbor/header.clj 100% <100%> (ø) :arrow_up:
src/clj_cbor/bytes.clj 100% <100%> (ø)
src/clj_cbor/core.clj 98.86% <50%> (-1.14%) :arrow_down:
src/clj_cbor/codec.clj 96.54% <90.9%> (-3.14%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 84c8714...c5e8770. Read the comment docs.

aria42 commented 5 years ago

Ok, I'll address this, sorry this was a little sloppy and thrown together too quickly. So I think the right way to address everything is

aria42 commented 5 years ago

Ok, so I did the optimization and re-arranged things so the jump table is used recursively, that triggered moving out stuff into the bytes and decoder namespaces. I'm not thrilled about that choice, but it seemed like the least shitty option.

Surprisingly, the impact on performance wasn't that much from adding recursive stuff (now decoding takes half the time, see spreadsheet). I also tuned up some performance issues across the board so the baseline moved a little bit as well, and that's why you're not getting as much of a leap as you expect. Still 2x faster is pretty nice and the jump code itself is pretty small (but it triggers some other changes that are a bit messy).

My todo:

aria42 commented 5 years ago

Ok, I give up on the Coverage test check. If you look at the coverage on jump.clj it's claiming the doseq that populates the jump entries is untested because that form looks disconnected from the result expression and it's < 100 lines, so 2-3 untested lines will cause you to fail the test. I can write this in such a way that it looks covered, but that seems wrong. Any suggestions?

aria42 commented 5 years ago

@greglook Wanted to draw your attention to b631671. I believe the current unsigned long coercion might be wrong (or at least it's not clear to me) since it appears to use the bytes in little endian order instead of big endian which is what the CBOR spec requires for all values. You can see the byte order reversal if you compare the binary string representations

I updated the function and the newer function is also 2x faster.

clj-cbor.bytes> (Long/toBinaryString -10)
"1111111111111111111111111111111111111111111111111111111111110110"
;; this is the older function which moves "11110110" to the beggining
clj-cbor.bytes> (.toString (.toBigInteger (old-to-unsigned-long -10)) 2)
"1111011011111111111111111111111111111111111111111111111111111111"
;; the newer function preserves the binary representation, which is what
;; you expect since all that's changing is how you interpret the bytes
clj-cbor.bytes> (.toString (.toBigInteger (to-unsigned-long -10)) 2)
"1111111111111111111111111111111111111111111111111111111111110110"
aria42 commented 5 years ago

Also, update on end-to-end performance. Based on my 84 samples, looking at the sum of the decoding time average in micro-seconds (each average comes from a criterium benchmark of decoding the random sample).

Prior to my changes:

aria42 commented 5 years ago

Ok, took @greglook to inline jump stuff, total lines added specifically for jump stuff is lower. Bad news is my prior benchmark was wrong since I accidentally left off the handlers in the jump decoder. I decided to use Reddit API data (see EDN here) instead since the random data has some weird attributes (e.g, very long keywords).

So how's performance on this data

;; First one is 84c8714
clj-cbor.core> (c/bench (decode default-codec bs))
Evaluation count : 26280 in 60 samples of 438 calls.
Evaluation count : 16020 in 60 samples of 267 calls.
             Execution time mean : 3.809419 ms
    Execution time std-deviation : 151.597668 µs
   Execution time lower quantile : 3.615037 ms ( 2.5%)
   Execution time upper quantile : 4.241055 ms (97.5%)
                   Overhead used : 6.791016 ns
nil
;; Next two use this PR b631671
clj-cbor.core> (c/bench (decode default-codec bs))
Evaluation count : 26280 in 60 samples of 438 calls.
             Execution time mean : 2.400332 ms
    Execution time std-deviation : 142.585442 µs
   Execution time lower quantile : 2.265757 ms ( 2.5%)
   Execution time upper quantile : 2.690157 ms (97.5%)
                   Overhead used : 6.791016 ns
nil
clj-cbor.core> (def jump-decode (with-jump-table default-codec))
clj-cbor.core> (c/bench (decode jump-decode bs))
Evaluation count : 29880 in 60 samples of 498 calls.
             Execution time mean : 2.080512 ms
    Execution time std-deviation : 142.985792 µs
   Execution time lower quantile : 1.941332 ms ( 2.5%)
   Execution time upper quantile : 2.373102 ms (97.5%)
                   Overhead used : 6.791016 ns

Which is a statistically significant difference, but a lot smaller than the 2x earlier reported