favila commented 6 years ago

Start decoding maps into a TransientArrayMap initially and allow its assoc method to auto-promote it to a TransientHashMap if its size gets large enough. This is consistent with how (into {} xs) behaves.

favila commented 6 years ago

That will teach me to make a patch using github's editor...

Note I had to force the JVM timezone to UTC to get the tests to pass. I can make a PR for that if interested (I added -Duser.timezone=UTC to :jvm-opts in the project.clj).

favila commented 6 years ago

Quick decoding speed perf comparison using lein perf repl and (jsonista.json-perf-test/decode-perf)

Master:

######################
## decode: jsonista ##
######################

Evaluation count : 1498638 in 6 samples of 249773 calls.
             Execution time mean : 405.190155 ns
    Execution time std-deviation : 12.287989 ns
   Execution time lower quantile : 392.939313 ns ( 2.5%)
   Execution time upper quantile : 417.862655 ns (97.5%)
                   Overhead used : 1.496308 ns

With PR:

######################
## decode: jsonista ##
######################

Evaluation count : 1740828 in 6 samples of 290138 calls.
             Execution time mean : 347.886539 ns
    Execution time std-deviation : 7.886340 ns
   Execution time lower quantile : 338.432356 ns ( 2.5%)
   Execution time upper quantile : 356.021850 ns (97.5%)
                   Overhead used : 1.498982 ns

So it's no worse. On larger maps there should be no discernible difference. I expect the worst case would be a very large JSON with lots of maps exactly 9 keys large.

ikitommi commented 6 years ago

Thanks! I'll run the tests to get new numbers. Curious anoit the timezone setting, haven't seen it myself.

ikitommi commented 6 years ago

Ran full tests. It's slower on all other but the smallest, +25% on largest dataset. Hmm.

user=> (pt/decode-perf-different-sizes)

################################
## dev-resources/json10b.json ##
################################

######################
## decode: cheshire ##
######################

Evaluation count : 696924 in 6 samples of 116154 calls.
             Execution time mean : 871.801921 ns
    Execution time std-deviation : 14.851398 ns
   Execution time lower quantile : 852.682120 ns ( 2.5%)
   Execution time upper quantile : 889.601589 ns (97.5%)
                   Overhead used : 1.663097 ns

######################
## decode: jsonista ##
######################

Evaluation count : 1623582 in 6 samples of 270597 calls.
             Execution time mean : 368.641380 ns
    Execution time std-deviation : 5.489577 ns
   Execution time lower quantile : 364.305299 ns ( 2.5%)
   Execution time upper quantile : 377.397293 ns (97.5%)
                   Overhead used : 1.663097 ns

Found 1 outliers in 6 samples (16.6667 %)
  low-severe   1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers

#################################
## dev-resources/json100b.json ##
#################################

######################
## decode: cheshire ##
######################

Evaluation count : 326754 in 6 samples of 54459 calls.
             Execution time mean : 1.920729 µs
    Execution time std-deviation : 142.483898 ns
   Execution time lower quantile : 1.797980 µs ( 2.5%)
   Execution time upper quantile : 2.084143 µs (97.5%)
                   Overhead used : 1.663097 ns

######################
## decode: jsonista ##
######################

Evaluation count : 420270 in 6 samples of 70045 calls.
             Execution time mean : 1.419845 µs
    Execution time std-deviation : 1.383122 ns
   Execution time lower quantile : 1.417836 µs ( 2.5%)
   Execution time upper quantile : 1.421053 µs (97.5%)
                   Overhead used : 1.663097 ns

###############################
## dev-resources/json1k.json ##
###############################

######################
## decode: cheshire ##
######################

Evaluation count : 62778 in 6 samples of 10463 calls.
             Execution time mean : 9.582913 µs
    Execution time std-deviation : 208.022648 ns
   Execution time lower quantile : 9.433753 µs ( 2.5%)
   Execution time upper quantile : 9.934920 µs (97.5%)
                   Overhead used : 1.663097 ns

Found 1 outliers in 6 samples (16.6667 %)
  low-severe   1 (16.6667 %)
 Variance from outliers : 13.8889 % Variance is moderately inflated by outliers

######################
## decode: jsonista ##
######################

Evaluation count : 87834 in 6 samples of 14639 calls.
             Execution time mean : 6.833222 µs
    Execution time std-deviation : 15.637431 ns
   Execution time lower quantile : 6.819147 µs ( 2.5%)
   Execution time upper quantile : 6.855965 µs (97.5%)
                   Overhead used : 1.663097 ns

################################
## dev-resources/json10k.json ##
################################

######################
## decode: cheshire ##
######################

Evaluation count : 5592 in 6 samples of 932 calls.
             Execution time mean : 108.961129 µs
    Execution time std-deviation : 1.832814 µs
   Execution time lower quantile : 107.076700 µs ( 2.5%)
   Execution time upper quantile : 111.168429 µs (97.5%)
                   Overhead used : 1.663097 ns

######################
## decode: jsonista ##
######################

Evaluation count : 7674 in 6 samples of 1279 calls.
             Execution time mean : 79.148681 µs
    Execution time std-deviation : 1.571824 µs
   Execution time lower quantile : 77.560891 µs ( 2.5%)
   Execution time upper quantile : 81.411635 µs (97.5%)
                   Overhead used : 1.663097 ns

#################################
## dev-resources/json100k.json ##
#################################

######################
## decode: cheshire ##
######################

Evaluation count : 594 in 6 samples of 99 calls.
             Execution time mean : 1.022577 ms
    Execution time std-deviation : 7.869831 µs
   Execution time lower quantile : 1.009443 ms ( 2.5%)
   Execution time upper quantile : 1.029869 ms (97.5%)
                   Overhead used : 1.663097 ns

######################
## decode: jsonista ##
######################

Evaluation count : 804 in 6 samples of 134 calls.
             Execution time mean : 760.640896 µs
    Execution time std-deviation : 9.409679 µs
   Execution time lower quantile : 748.581418 µs ( 2.5%)
   Execution time upper quantile : 771.395521 µs (97.5%)
                   Overhead used : 1.663097 ns

PR

user=> (pt/decode-perf-different-sizes)

################################
## dev-resources/json10b.json ##
################################

######################
## decode: cheshire ##
######################

Evaluation count : 702180 in 6 samples of 117030 calls.
             Execution time mean : 857.311817 ns
    Execution time std-deviation : 20.993995 ns
   Execution time lower quantile : 834.640639 ns ( 2.5%)
   Execution time upper quantile : 876.872016 ns (97.5%)
                   Overhead used : 1.455435 ns

######################
## decode: jsonista ##
######################

Evaluation count : 2033982 in 6 samples of 338997 calls.
             Execution time mean : 313.583024 ns
    Execution time std-deviation : 4.336184 ns
   Execution time lower quantile : 306.860306 ns ( 2.5%)
   Execution time upper quantile : 317.723690 ns (97.5%)
                   Overhead used : 1.455435 ns

#################################
## dev-resources/json100b.json ##
#################################

######################
## decode: cheshire ##
######################

Evaluation count : 319752 in 6 samples of 53292 calls.
             Execution time mean : 1.851876 µs
    Execution time std-deviation : 46.474996 ns
   Execution time lower quantile : 1.797110 µs ( 2.5%)
   Execution time upper quantile : 1.895828 µs (97.5%)
                   Overhead used : 1.455435 ns

######################
## decode: jsonista ##
######################

Evaluation count : 411918 in 6 samples of 68653 calls.
             Execution time mean : 1.468725 µs
    Execution time std-deviation : 26.319111 ns
   Execution time lower quantile : 1.443435 µs ( 2.5%)
   Execution time upper quantile : 1.498551 µs (97.5%)
                   Overhead used : 1.455435 ns

###############################
## dev-resources/json1k.json ##
###############################

######################
## decode: cheshire ##
######################

Evaluation count : 65568 in 6 samples of 10928 calls.
             Execution time mean : 9.347651 µs
    Execution time std-deviation : 198.907095 ns
   Execution time lower quantile : 9.112054 µs ( 2.5%)
   Execution time upper quantile : 9.556516 µs (97.5%)
                   Overhead used : 1.455435 ns

######################
## decode: jsonista ##
######################

Evaluation count : 66426 in 6 samples of 11071 calls.
             Execution time mean : 9.240849 µs
    Execution time std-deviation : 220.139022 ns
   Execution time lower quantile : 9.036446 µs ( 2.5%)
   Execution time upper quantile : 9.497906 µs (97.5%)
                   Overhead used : 1.455435 ns

################################
## dev-resources/json10k.json ##
################################

######################
## decode: cheshire ##
######################

Evaluation count : 5976 in 6 samples of 996 calls.
             Execution time mean : 100.865470 µs
    Execution time std-deviation : 1.236078 µs
   Execution time lower quantile : 99.748574 µs ( 2.5%)
   Execution time upper quantile : 102.322479 µs (97.5%)
                   Overhead used : 1.455435 ns

######################
## decode: jsonista ##
######################

Evaluation count : 5712 in 6 samples of 952 calls.
             Execution time mean : 106.656607 µs
    Execution time std-deviation : 2.079502 µs
   Execution time lower quantile : 104.561112 µs ( 2.5%)
   Execution time upper quantile : 109.591392 µs (97.5%)
                   Overhead used : 1.455435 ns

#################################
## dev-resources/json100k.json ##
#################################

######################
## decode: cheshire ##
######################

Evaluation count : 618 in 6 samples of 103 calls.
             Execution time mean : 983.667031 µs
    Execution time std-deviation : 10.972101 µs
   Execution time lower quantile : 970.468612 µs ( 2.5%)
   Execution time upper quantile : 996.102557 µs (97.5%)
                   Overhead used : 1.455435 ns

######################
## decode: jsonista ##
######################

Evaluation count : 594 in 6 samples of 99 calls.
             Execution time mean : 1.014018 ms
    Execution time std-deviation : 11.668386 µs
   Execution time lower quantile : 997.346263 µs ( 2.5%)
   Execution time upper quantile : 1.027687 ms (97.5%)
                   Overhead used : 1.455435 ns

favila commented 6 years ago

Huh, I didn't expect that big a slowdown. There is a tradeoff between faster map construction at size <= 8, then once you hit size 9 assoc has to spend a little extra time converting to the TransientHashMap. I don't know how many assoc-es worth of time is lost during that conversion; but the larger the map the less that extra constant time matters.

json100k is pretty close to a worst-case: "results" is a giant array of maps of size 12, so each one starts out as an array-map then is converted to hash-map.

I don't know the precise original motivation for Clojure doing it this way. I think they were optimizing for lookup speed on smaller maps.

ikitommi commented 6 years ago

I ran the tests with both PAM and PHM for the test data that was stripped out of 7 keys, e.g. only 5 keys in the user. Here are the results (PHM time compared to PAM time, e.g. smaller is better):

original data	5 keys	12 keys
10b	109 %	120 %
100b	92 %	107 %
1k	97 %	75 %
10k	95 %	72 %
100k	94 %	72 %

This seems unintuitive, but rerun the tests many times. I think the 100b & 1k are the common cases and base on those tests, PHM behaves better.

favila commented 6 years ago

Yeah that is very puzzling. Oh well.

ikitommi commented 6 years ago

Did a reverse commit on this. But thanks anyway, was interesting.

metosin / jsonista

Decode small maps as PAM, auto-grow to PHM #17

previous

PR