sile / jsone

Erlang JSON library
MIT License
291 stars 72 forks source link

Improve string encoding performance #8

Closed pichi closed 8 years ago

pichi commented 8 years ago

This patch improves performance of string encoding significantly:

x old BEAM escape
+ new BEAM escape
+--------------------------------------------------------------------------+
| ++  +                                                       xx xxxx x   x|
| +   +                                                       xx   xx x    |
| +   +                                                       xx   x       |
| +   +                                                       xx           |
| +   +                                                       xx           |
| +   +                                                       xx           |
| +                                                           xx           |
| +                                                           xx           |
| +                                                           xx           |
| +                                                            x           |
| +                                                            x           |
| +                                                                        |
| +                                                                        |
| +                                                                        |
| +                                                                        |
| +                                                                        |
| +                                                                        |
| +                                                                        |
| +                                                                        |
| +                                                                        |
| +                                                                        |
| +                                                                        |
| +                                                                        |
|                                                            |_M_A__|      |
||MA|                                                                      |
+--------------------------------------------------------------------------+
Dataset: x N=30 CI=95.0000
Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
Min:            930.000
1st Qu.         933.000
Median:         938.000
3rd Qu.         958.000
Max:            996.000
Average:        945.167 [   1.34567e-2] (      940.100 ‥       952.267)
Std. Dev:       16.5614 [    -0.525046] (      12.3988 ‥       23.4927)

Outliers: 0/1 = 1 (μ=945.180, σ=16.0364)
        Outlier variance:    3.22222e-2 (slight)

------

Dataset: + N=30 CI=95.0000
Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
Min:            608.000
1st Qu.         609.000
Median:         609.000
3rd Qu.         612.000
Max:            633.000
Average:        613.567 [   5.74000e-3] (      610.833 ‥       617.467)
Std. Dev:       9.19401 [    -0.253030] (      6.14387 ‥       11.1649)

Outliers: 0/6 = 6 (μ=613.572, σ=8.94098)
        Outlier variance:    3.22222e-2 (slight)

Difference at 95.0% confidence
        -331.600 ± 6.92367
        -35.0838% ± 0.732535%
        (Student's t, pooled s = 13.3942)
------

x old BEAM utf-8
+ new BEAM utf-8
+--------------------------------------------------------------------------+
|  ++++      +     +     x *xxxx          x                               x|
|  ++++                  x xxx x                                           |
|  ++++                  x xx                                              |
|  ++++                  x xx                                              |
|   + +                  x xx                                              |
|   + +                  x  x                                              |
|   +                    x  x                                              |
|   +                    x                                                 |
|   +                    x                                                 |
|   +                    x                                                 |
|   +                    x                                                 |
|   +                                                                      |
|   +                                                                      |
|                   |______M_A________|                                    |
||__M_A____|                                                               |
+--------------------------------------------------------------------------+
Dataset: x N=30 CI=95.0000
Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
Min:            541.000
1st Qu.         542.000
Median:         547.000
3rd Qu.         549.000
Max:            650.000
Average:        551.000 [   3.90667e-3] (      546.633 ‥       564.667)
Std. Dev:       20.1255 [     -2.42533] (      4.82200 ‥       39.9254)

Outliers: 0/2 = 2 (μ=551.004, σ=17.7001)
        Outlier variance:      0.126393 (moderate)

------

Dataset: + N=30 CI=95.0000
Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
Min:            494.000
1st Qu.         496.000
Median:         496.000
3rd Qu.         500.000
Max:            546.000
Average:        500.067 [  -8.15000e-3] (      497.367 ‥       506.233)
Std. Dev:       11.0327 [    -0.802224] (      5.06055 ‥       19.3617)

Outliers: 0/3 = 3 (μ=500.059, σ=10.2304)
        Outlier variance:    3.22222e-2 (slight)

Difference at 95.0% confidence
        -50.9333 ± 8.38895
        -9.24380% ± 1.52249%
        (Student's t, pooled s = 16.2289)
------

x old HiPE escape
+ new HiPE escape
+--------------------------------------------------------------------------+
|++++++++                                                              xxxx|
|+++++                                                                 xx  |
|++++                                                                  xx  |
| +++                                                                  xx  |
| +++                                                                  xx  |
| +++                                                                  xx  |
|  ++                                                                  xx  |
|   +                                                                  xx  |
|   +                                                                  xx  |
|                                                                      |A  |
| |MA|                                                                     |
+--------------------------------------------------------------------------+
Dataset: x N=30 CI=95.0000
Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
Min:            436.000
1st Qu.         437.000
Median:         438.000
3rd Qu.         438.000
Max:            444.000
Average:        437.967 [  -5.73333e-3] (      437.533 ‥       438.700)
Std. Dev:       1.54213 [  -9.01925e-2] (     0.932183 ‥       2.73525)

Outliers: 0/3 = 3 (μ=437.961, σ=1.45194)
        Outlier variance:    3.22222e-2 (slight)

------

Dataset: + N=30 CI=95.0000
Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
Min:            266.000
1st Qu.         269.000
Median:         272.000
3rd Qu.         274.000
Max:            283.000
Average:        272.200 [  -2.08333e-3] (      270.933 ‥       273.633)
Std. Dev:       3.89872 [    -0.109426] (      3.00287 ‥       5.35423)

Outliers: 0/1 = 1 (μ=272.198, σ=3.78929)
        Outlier variance:    3.22222e-2 (slight)

Difference at 95.0% confidence
        -165.767 ± 1.53246
        -37.8492% ± 0.349904%
        (Student's t, pooled s = 2.96464)
------

x old HiPE utf-8
+ new HiPE utf-8
+--------------------------------------------------------------------------+
|+++ + +++++                                                          xxxxx|
|++    ++++                                                           xxx x|
|++    +++                                                            xxx  |
|++     ++                                                             xx  |
|++     +                                                              xx  |
| +     +                                                              xx  |
| +                                                                    xx  |
|                                                                     |A|  |
| |__A_M_|                                                                 |
+--------------------------------------------------------------------------+
Dataset: x N=30 CI=95.0000
Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
Min:            294.000
1st Qu.         295.000
Median:         295.000
3rd Qu.         296.000
Max:            298.000
Average:        295.433 [  -1.76667e-3] (      295.100 ‥       295.800)
Std. Dev:      0.971431 [  -2.93599e-2] (     0.691492 ‥       1.33089)

Outliers: 0/2 = 2 (μ=295.432, σ=0.942071)
        Outlier variance:    3.22222e-2 (slight)

------

Dataset: + N=30 CI=95.0000
Statistic     Value     [         Bias] (Bootstrapped LB‥UB)
Min:            232.000
1st Qu.         233.000
Median:         237.000
3rd Qu.         238.000
Max:            241.000
Average:        235.900 [  -3.64333e-3] (      234.800 ‥       236.933)
Std. Dev:       3.03258 [  -5.51622e-2] (      2.70992 ‥       3.45496)

Outliers: 0/0 = 0 (μ=235.896, σ=2.97742)
        Outlier variance:    3.22222e-2 (slight)

Difference at 95.0% confidence
        -59.5333 ± 1.16393
        -20.1512% ± 0.393974%
        (Student's t, pooled s = 2.25169)
------
sile commented 8 years ago

Oh, this is a great patch. Thank you for your contribution!

sile commented 8 years ago

FYI.

I ran the poison's benchmark on the new jsone.

RESULT: https://github.com/sile/jsone/blob/master/BENCHMARK.md

SUMMARY and COMPARISON:

|    encode type        |      old      |      new      |
|-----------------------|---------------|---------------|
| string escaping       |  934.83 μs/op |  697.40 µs/op |
| string escaping(HiPE) |  481.52 μs/op |  343.38 µs/op |
| large json            | 1379.04 μs/op | 1508.22 μs/op |
| large json (HiPE)     |  634.06 μs/op |  945.08 μs/op |
| pretty print          | 1734.20 μs/op | 2024.11 μs/op |
| pretty print (HiPE)   |  956.29 μs/op | 1359.26 μs/op |

INPUT STRING: https://github.com/devinus/poison/blob/2.2.0/bench/data/UTF-8-demo.txt
INPUT JSON: https://github.com/devinus/poison/blob/2.2.0/bench/data/generated.json

The performance of "string escaping" has been significantly improved. On the other hand, encoding performance for a large object has been degraded.

sile commented 8 years ago

FYI.

As a countermeasure to the above problem, I applied following patch: