sile / jsone

Erlang JSON library
MIT License
291 stars 71 forks source link

Streaming decode #74

Open zuiderkwast opened 2 years ago

zuiderkwast commented 2 years ago

A new option stream:

Decode the input in multiple chunks. Instead of a result or error, {incomplete, fun()} is returned. The returned fun takes a single argument and it should called to continue the decoding. When all the input has been provided, the fun should be called with end_stream or end_json to signal the end of input and then the fun returns a result or an error.

This is a first working implementation. We have yet to run the benchmark.

I did minimal changes to make it work. Perhaps some refactoring can make it less ugly.

If this makes the core decode implementation slower, we could consider putting the stream decode code in a separate module.

Fixes #73.

sile commented 2 years ago

Sorry for the delayed response but what is the status of this PR? It's still marked as a draft, so are there any TODOs to make it review-ready? (maybe benchmarking?)

zuiderkwast commented 2 years ago

I think it is ready for review. Only benchmarking is missing. I will mark it ready for review.

Is there a way to run the benchmark of a branch and compare it with master?

sile commented 2 years ago

I see. Thanks. I starts review of this PR.

Is there a way to run the benchmark of a branch and compare it with master?

There is a benchmark script I used at https://github.com/sile/jsone/tree/master/benchmark/run.sh. But it's not well maintained, so feel free to use another or your own benchmark if you favor that.

sile commented 2 years ago

The CI failures could fix if you run $ rebar3 efmt -w to format the source code.

sile commented 2 years ago

I came up with an idea that it might be possible to implement this feature without modifying the jsone_decode module at all. The following code shows the ideas.

%% in jsone.erl file
try_decode_stream(Json, Options) ->
  case jsone_decode:decode(Json, Options) of
    {ok, Value, Remainings} ->
      {ok, Value, Remainings};
    %% Add handligs of incomplete cases here
    {error, {badarg, [{jsone_decode, array_next, Args = [<<>>, Values, Nexts, Buf, Opt]}]}} ->
      incomplete(fun jsone_decode:array_next/5, Args);
    %% ... other clauses ...
    {error, Reason} ->
      {error, Reason}
  end.

I'm not 100% sure this approach is actually possible but I think that this has obvious merit that it doesn't introduce any performance overhead when this feature isn't used.

zuiderkwast commented 2 years ago

That's a very interesting idea. It keeps jsone_decode simple. The case where I'm not sure is when the input is a number split between the digits. In this case, an incomplete input is not an error. E.g. <<"1">>, <<".23">>, <<"e45">>.

I will benchmark with the current implementation first. Then, I might try the badarg-to-incomplete version.

zuiderkwast commented 2 years ago

Benchmarks of current version and with this PR, decode only.

jsone = current versionjsone = this PR
##### With input Blockchain #####
Name             ips        average  deviation         median         99th %
jiffy         3.61 K      277.26 μs    ±25.91%      248.63 μs      506.51 μs
Jason         2.55 K      392.71 μs    ±10.81%      388.40 μs      524.35 μs
jsone         1.87 K      534.87 μs    ±15.29%      523.80 μs      852.57 μs
Tiny          1.46 K      683.48 μs    ±10.97%      668.14 μs      932.35 μs
Poison        1.37 K      729.87 μs    ±17.99%      706.65 μs     1167.41 μs
JSX           1.24 K      809.64 μs    ±11.73%      796.71 μs     1091.97 μs
JSON          0.46 K     2161.80 μs     ±9.99%     2130.79 μs     2827.24 μs

Comparison: 
jiffy         3.61 K
Jason         2.55 K - 1.42x slower +115.45 μs
jsone         1.87 K - 1.93x slower +257.61 μs
Tiny          1.46 K - 2.47x slower +406.22 μs
Poison        1.37 K - 2.63x slower +452.61 μs
JSX           1.24 K - 2.92x slower +532.38 μs
JSON          0.46 K - 7.80x slower +1884.54 μs

##### With input Giphy #####
Name             ips        average  deviation         median         99th %
jiffy         364.12        2.75 ms    ±19.57%        2.66 ms        4.32 ms
Jason         245.39        4.08 ms     ±9.34%        4.00 ms        5.71 ms
Tiny          137.24        7.29 ms     ±3.41%        7.23 ms        8.23 ms
Poison        123.35        8.11 ms    ±14.41%        7.93 ms       15.52 ms
jsone         122.55        8.16 ms     ±7.08%        8.04 ms        9.91 ms
JSX           101.85        9.82 ms     ±5.67%        9.73 ms       11.53 ms
JSON           52.04       19.22 ms     ±4.86%       19.06 ms       22.83 ms

Comparison: 
jiffy         364.12
Jason         245.39 - 1.48x slower +1.33 ms
Tiny          137.24 - 2.65x slower +4.54 ms
Poison        123.35 - 2.95x slower +5.36 ms
jsone         122.55 - 2.97x slower +5.41 ms
JSX           101.85 - 3.57x slower +7.07 ms
JSON           52.04 - 7.00x slower +16.47 ms

##### With input GitHub #####
Name             ips        average  deviation         median         99th %
jiffy        1427.95        0.70 ms    ±15.47%        0.68 ms        1.01 ms
Jason         816.06        1.23 ms     ±9.34%        1.20 ms        1.76 ms
jsone         575.18        1.74 ms    ±14.45%        1.70 ms        2.43 ms
Tiny          535.76        1.87 ms     ±6.55%        1.84 ms        2.37 ms
Poison        495.11        2.02 ms    ±10.91%        1.99 ms        2.76 ms
JSX           324.99        3.08 ms     ±7.06%        3.03 ms        3.76 ms
JSON          183.14        5.46 ms     ±5.84%        5.39 ms        6.47 ms

Comparison: 
jiffy        1427.95
Jason         816.06 - 1.75x slower +0.53 ms
jsone         575.18 - 2.48x slower +1.04 ms
Tiny          535.76 - 2.67x slower +1.17 ms
Poison        495.11 - 2.88x slower +1.32 ms
JSX           324.99 - 4.39x slower +2.38 ms
JSON          183.14 - 7.80x slower +4.76 ms

##### With input GovTrack #####
Name             ips        average  deviation         median         99th %
jiffy           9.80      102.08 ms     ±9.94%      101.31 ms      136.90 ms
Jason           8.57      116.63 ms     ±6.05%      115.18 ms      144.48 ms
jsone           5.09      196.53 ms     ±5.87%      195.59 ms      224.68 ms
Tiny            4.31      232.03 ms     ±4.59%      232.55 ms      267.62 ms
Poison          3.60      277.70 ms     ±9.39%      271.36 ms      401.83 ms
JSX             2.98      336.05 ms     ±5.65%      335.72 ms      369.61 ms
JSON            1.13      882.46 ms     ±3.27%      871.20 ms      955.92 ms

Comparison: 
jiffy           9.80
Jason           8.57 - 1.14x slower +14.55 ms
jsone           5.09 - 1.93x slower +94.45 ms
Tiny            4.31 - 2.27x slower +129.95 ms
Poison          3.60 - 2.72x slower +175.62 ms
JSX             2.98 - 3.29x slower +233.97 ms
JSON            1.13 - 8.64x slower +780.38 ms

##### With input Issue 90 #####
Name             ips        average  deviation         median         99th %
jiffy          35.68       28.03 ms     ±6.40%       27.76 ms       32.63 ms
Jason           8.48      117.98 ms     ±3.88%      115.47 ms      134.42 ms
Poison          7.98      125.29 ms     ±2.47%      124.59 ms      145.71 ms
Tiny            7.29      137.25 ms     ±3.03%      135.69 ms      159.29 ms
JSX             6.82      146.73 ms     ±3.05%      146.59 ms      166.66 ms
jsone           6.66      150.14 ms     ±1.79%      149.39 ms      165.53 ms
JSON            1.34      743.58 ms     ±3.69%      736.49 ms      793.92 ms

Comparison: 
jiffy          35.68
Jason           8.48 - 4.21x slower +89.96 ms
Poison          7.98 - 4.47x slower +97.26 ms
Tiny            7.29 - 4.90x slower +109.23 ms
JSX             6.82 - 5.24x slower +118.70 ms
jsone           6.66 - 5.36x slower +122.12 ms
JSON            1.34 - 26.53x slower +715.55 ms

##### With input JSON Generator #####
Name             ips        average  deviation         median         99th %
jiffy         334.43        2.99 ms    ±27.92%        2.89 ms        4.82 ms
Jason         328.87        3.04 ms     ±5.19%        3.00 ms        3.52 ms
jsone         192.21        5.20 ms    ±15.92%        5.03 ms        9.20 ms
Tiny          170.86        5.85 ms     ±3.11%        5.83 ms        6.53 ms
Poison        155.75        6.42 ms     ±5.09%        6.39 ms        7.35 ms
JSX           128.88        7.76 ms     ±4.45%        7.71 ms        8.71 ms
JSON           40.39       24.76 ms     ±4.76%       24.70 ms       29.14 ms

Comparison: 
jiffy         334.43
Jason         328.87 - 1.02x slower +0.0505 ms
jsone         192.21 - 1.74x slower +2.21 ms
Tiny          170.86 - 1.96x slower +2.86 ms
Poison        155.75 - 2.15x slower +3.43 ms
JSX           128.88 - 2.59x slower +4.77 ms
JSON           40.39 - 8.28x slower +21.77 ms

##### With input JSON Generator (Pretty) #####
Name             ips        average  deviation         median         99th %
Jason         274.83        3.64 ms     ±6.18%        3.58 ms        4.25 ms
jiffy         258.07        3.87 ms    ±24.39%        3.64 ms        6.30 ms
jsone         181.26        5.52 ms     ±9.94%        5.37 ms        7.81 ms
Tiny          156.34        6.40 ms     ±6.45%        6.35 ms        7.09 ms
Poison        145.53        6.87 ms     ±8.41%        6.79 ms        7.96 ms
JSX           117.76        8.49 ms     ±5.45%        8.41 ms        9.85 ms
JSON           37.86       26.41 ms    ±11.13%       25.97 ms       42.08 ms

Comparison: 
Jason         274.83
jiffy         258.07 - 1.06x slower +0.24 ms
jsone         181.26 - 1.52x slower +1.88 ms
Tiny          156.34 - 1.76x slower +2.76 ms
Poison        145.53 - 1.89x slower +3.23 ms
JSX           117.76 - 2.33x slower +4.85 ms
JSON           37.86 - 7.26x slower +22.78 ms

##### With input Pokedex #####
Name             ips        average  deviation         median         99th %
Jason         530.81        1.88 ms    ±10.16%        1.85 ms        2.42 ms
jiffy         397.71        2.51 ms    ±24.47%        2.28 ms        4.01 ms
jsone         285.58        3.50 ms    ±10.17%        3.40 ms        4.95 ms
Poison        217.10        4.61 ms     ±3.69%        4.58 ms        5.31 ms
Tiny          206.99        4.83 ms     ±7.63%        4.78 ms        5.68 ms
JSX           164.06        6.10 ms     ±6.38%        6.00 ms        8.29 ms
JSON           53.07       18.84 ms     ±4.63%       18.73 ms       22.39 ms

Comparison: 
Jason         530.81
jiffy         397.71 - 1.33x slower +0.63 ms
jsone         285.58 - 1.86x slower +1.62 ms
Poison        217.10 - 2.45x slower +2.72 ms
Tiny          206.99 - 2.56x slower +2.95 ms
JSX           164.06 - 3.24x slower +4.21 ms
JSON           53.07 - 10.00x slower +16.96 ms

##### With input UTF-8 escaped #####
Name             ips        average  deviation         median         99th %
jiffy        8841.86       0.113 ms    ±29.39%       0.109 ms       0.186 ms
Poison       1212.80        0.82 ms    ±21.63%        0.76 ms        1.38 ms
Jason        1090.21        0.92 ms    ±16.01%        0.92 ms        1.34 ms
Tiny          802.81        1.25 ms    ±12.17%        1.25 ms        1.74 ms
jsone         703.02        1.42 ms    ±18.66%        1.40 ms        2.24 ms
JSX           683.53        1.46 ms    ±16.22%        1.42 ms        2.41 ms
JSON          550.49        1.82 ms    ±13.80%        1.89 ms        2.59 ms

Comparison: 
jiffy        8841.86
Poison       1212.80 - 7.29x slower +0.71 ms
Jason        1090.21 - 8.11x slower +0.80 ms
Tiny          802.81 - 11.01x slower +1.13 ms
jsone         703.02 - 12.58x slower +1.31 ms
JSX           683.53 - 12.94x slower +1.35 ms
JSON          550.49 - 16.06x slower +1.70 ms

##### With input UTF-8 unescaped #####
Name             ips        average  deviation         median         99th %
jiffy        13.66 K       73.22 μs    ±36.58%       70.58 μs      125.90 μs
Jason         5.36 K      186.41 μs    ±15.58%      177.30 μs      310.95 μs
Poison        4.58 K      218.35 μs    ±14.81%      204.15 μs      327.11 μs
JSX           3.71 K      269.18 μs    ±12.76%      257.38 μs      367.96 μs
jsone         3.24 K      308.71 μs    ±21.72%      294.44 μs      686.19 μs
JSON          3.05 K      327.74 μs    ±24.76%      308.11 μs      535.51 μs
Tiny          1.95 K      511.66 μs    ±14.44%      510.70 μs      698.80 μs

Comparison: 
jiffy        13.66 K
Jason         5.36 K - 2.55x slower +113.19 μs
Poison        4.58 K - 2.98x slower +145.12 μs
JSX           3.71 K - 3.68x slower +195.96 μs
jsone         3.24 K - 4.22x slower +235.48 μs
JSON          3.05 K - 4.48x slower +254.52 μs
Tiny          1.95 K - 6.99x slower +438.43 μs
##### With input Blockchain #####
Name             ips        average  deviation         median         99th %
jiffy         4.17 K      240.01 μs    ±23.01%      220.79 μs      380.98 μs
Jason         2.65 K      376.88 μs    ±10.31%      374.17 μs      488.45 μs
jsone         1.74 K      573.40 μs    ±10.16%      566.48 μs      768.11 μs
Poison        1.53 K      655.55 μs    ±12.55%      647.27 μs      920.13 μs
Tiny          1.48 K      677.49 μs    ±10.19%      666.94 μs      890.66 μs
JSX           1.28 K      784.19 μs    ±19.22%      756.03 μs     1599.32 μs
JSON          0.62 K     1618.36 μs     ±6.32%     1605.90 μs     2011.57 μs

Comparison: 
jiffy         4.17 K
Jason         2.65 K - 1.57x slower +136.87 μs
jsone         1.74 K - 2.39x slower +333.39 μs
Poison        1.53 K - 2.73x slower +415.54 μs
Tiny          1.48 K - 2.82x slower +437.48 μs
JSX           1.28 K - 3.27x slower +544.18 μs
JSON          0.62 K - 6.74x slower +1378.35 μs

##### With input Giphy #####
Name             ips        average  deviation         median         99th %
jiffy         411.06        2.43 ms    ±23.55%        2.23 ms        4.30 ms
Jason         273.37        3.66 ms     ±5.97%        3.65 ms        4.29 ms
Tiny          140.17        7.13 ms     ±4.34%        7.07 ms        8.09 ms
Poison        130.80        7.65 ms    ±10.39%        7.54 ms        9.31 ms
jsone         119.24        8.39 ms    ±13.71%        8.23 ms       13.73 ms
JSX           105.93        9.44 ms     ±8.54%        9.32 ms       12.97 ms
JSON           62.33       16.04 ms     ±7.67%       15.86 ms       18.90 ms

Comparison: 
jiffy         411.06
Jason         273.37 - 1.50x slower +1.23 ms
Tiny          140.17 - 2.93x slower +4.70 ms
Poison        130.80 - 3.14x slower +5.21 ms
jsone         119.24 - 3.45x slower +5.95 ms
JSX           105.93 - 3.88x slower +7.01 ms
JSON           62.33 - 6.59x slower +13.61 ms

##### With input GitHub #####
Name             ips        average  deviation         median         99th %
jiffy        1444.91        0.69 ms    ±17.07%        0.68 ms        1.13 ms
Jason         945.08        1.06 ms     ±5.21%        1.05 ms        1.23 ms
Tiny          549.44        1.82 ms     ±6.03%        1.80 ms        2.20 ms
jsone         530.12        1.89 ms     ±8.04%        1.87 ms        2.48 ms
Poison        508.65        1.97 ms     ±8.08%        1.96 ms        2.50 ms
JSX           354.20        2.82 ms     ±8.15%        2.79 ms        3.62 ms
JSON          228.48        4.38 ms     ±6.75%        4.34 ms        5.15 ms

Comparison: 
jiffy        1444.91
Jason         945.08 - 1.53x slower +0.37 ms
Tiny          549.44 - 2.63x slower +1.13 ms
jsone         530.12 - 2.73x slower +1.19 ms
Poison        508.65 - 2.84x slower +1.27 ms
JSX           354.20 - 4.08x slower +2.13 ms
JSON          228.48 - 6.32x slower +3.68 ms

##### With input GovTrack #####
Name             ips        average  deviation         median         99th %
jiffy          10.22       97.83 ms     ±9.66%       97.34 ms      128.47 ms
Jason           9.09      110.01 ms     ±5.28%      108.90 ms      132.37 ms
jsone           4.77      209.76 ms     ±7.49%      208.33 ms      258.01 ms
Tiny            4.26      234.93 ms     ±5.11%      234.71 ms      270.62 ms
Poison          3.87      258.50 ms     ±4.69%      257.95 ms      288.59 ms
JSX             3.05      327.43 ms     ±4.79%      327.82 ms      366.02 ms
JSON            1.30      771.14 ms     ±7.65%      753.15 ms      923.03 ms

Comparison: 
jiffy          10.22
Jason           9.09 - 1.12x slower +12.18 ms
jsone           4.77 - 2.14x slower +111.94 ms
Tiny            4.26 - 2.40x slower +137.10 ms
Poison          3.87 - 2.64x slower +160.67 ms
JSX             3.05 - 3.35x slower +229.61 ms
JSON            1.30 - 7.88x slower +673.32 ms

##### With input Issue 90 #####
Name             ips        average  deviation         median         99th %
jiffy          37.05       26.99 ms     ±1.95%       26.94 ms       28.45 ms
Jason           9.19      108.86 ms     ±0.69%      108.78 ms      111.33 ms
Poison          8.66      115.50 ms     ±1.99%      116.18 ms      121.62 ms
Tiny            7.49      133.58 ms     ±4.40%      132.69 ms      175.59 ms
JSX             7.10      140.85 ms     ±2.34%      140.14 ms      161.88 ms
jsone           5.82      171.69 ms     ±4.38%      168.20 ms      191.10 ms
JSON            1.39      721.93 ms     ±7.52%      740.32 ms      811.20 ms

Comparison: 
jiffy          37.05
Jason           9.19 - 4.03x slower +81.87 ms
Poison          8.66 - 4.28x slower +88.51 ms
Tiny            7.49 - 4.95x slower +106.59 ms
JSX             7.10 - 5.22x slower +113.86 ms
jsone           5.82 - 6.36x slower +144.70 ms
JSON            1.39 - 26.75x slower +694.93 ms

##### With input JSON Generator #####
Name             ips        average  deviation         median         99th %
jiffy         367.58        2.72 ms    ±27.95%        2.61 ms        4.46 ms
Jason         353.64        2.83 ms     ±4.67%        2.80 ms        3.28 ms
Tiny          176.42        5.67 ms     ±5.46%        5.60 ms        6.55 ms
jsone         161.45        6.19 ms    ±21.25%        5.87 ms       12.04 ms
Poison        155.22        6.44 ms    ±17.62%        6.21 ms       14.41 ms
JSX           129.81        7.70 ms     ±7.34%        7.61 ms       10.12 ms
JSON           45.34       22.06 ms     ±4.21%       21.91 ms       25.52 ms

Comparison: 
jiffy         367.58
Jason         353.64 - 1.04x slower +0.107 ms
Tiny          176.42 - 2.08x slower +2.95 ms
jsone         161.45 - 2.28x slower +3.47 ms
Poison        155.22 - 2.37x slower +3.72 ms
JSX           129.81 - 2.83x slower +4.98 ms
JSON           45.34 - 8.11x slower +19.34 ms

##### With input JSON Generator (Pretty) #####
Name             ips        average  deviation         median         99th %
Jason         289.85        3.45 ms     ±5.49%        3.41 ms        3.97 ms
jiffy         274.64        3.64 ms    ±26.63%        3.40 ms        6.52 ms
Tiny          169.27        5.91 ms     ±2.85%        5.89 ms        6.56 ms
Poison        148.18        6.75 ms    ±19.28%        6.49 ms       15.94 ms
jsone         145.53        6.87 ms    ±24.00%        6.56 ms       15.73 ms
JSX           119.58        8.36 ms     ±8.23%        8.23 ms       10.93 ms
JSON           43.42       23.03 ms    ±11.39%       22.63 ms       37.01 ms

Comparison: 
Jason         289.85
jiffy         274.64 - 1.06x slower +0.191 ms
Tiny          169.27 - 1.71x slower +2.46 ms
Poison        148.18 - 1.96x slower +3.30 ms
jsone         145.53 - 1.99x slower +3.42 ms
JSX           119.58 - 2.42x slower +4.91 ms
JSON           43.42 - 6.68x slower +19.58 ms

##### With input Pokedex #####
Name             ips        average  deviation         median         99th %
Jason         529.39        1.89 ms     ±9.23%        1.87 ms        2.38 ms
jiffy         374.99        2.67 ms    ±29.17%        2.45 ms        4.87 ms
jsone         230.71        4.33 ms    ±12.27%        4.28 ms        6.57 ms
Tiny          224.30        4.46 ms     ±3.69%        4.43 ms        5.08 ms
Poison        220.93        4.53 ms    ±10.88%        4.41 ms        5.51 ms
JSX           173.83        5.75 ms     ±4.76%        5.70 ms        6.58 ms
JSON           57.09       17.52 ms    ±10.26%       17.18 ms       25.97 ms

Comparison: 
Jason         529.39
jiffy         374.99 - 1.41x slower +0.78 ms
jsone         230.71 - 2.29x slower +2.45 ms
Tiny          224.30 - 2.36x slower +2.57 ms
Poison        220.93 - 2.40x slower +2.64 ms
JSX           173.83 - 3.05x slower +3.86 ms
JSON           57.09 - 9.27x slower +15.63 ms

##### With input UTF-8 escaped #####
Name             ips        average  deviation         median         99th %
jiffy        8892.62       0.112 ms    ±16.90%       0.110 ms       0.171 ms
Poison       1255.05        0.80 ms    ±21.48%        0.73 ms        1.38 ms
Jason        1162.73        0.86 ms    ±14.01%        0.88 ms        1.16 ms
Tiny          848.27        1.18 ms    ±13.39%        1.19 ms        1.69 ms
jsone         727.05        1.38 ms    ±17.55%        1.35 ms        2.17 ms
JSX           616.71        1.62 ms    ±18.07%        1.69 ms        2.37 ms
JSON          568.82        1.76 ms    ±23.80%        1.71 ms        3.54 ms

Comparison: 
jiffy        8892.62
Poison       1255.05 - 7.09x slower +0.68 ms
Jason        1162.73 - 7.65x slower +0.75 ms
Tiny          848.27 - 10.48x slower +1.07 ms
jsone         727.05 - 12.23x slower +1.26 ms
JSX           616.71 - 14.42x slower +1.51 ms
JSON          568.82 - 15.63x slower +1.65 ms

##### With input UTF-8 unescaped #####
Name             ips        average  deviation         median         99th %
jiffy        13.86 K       72.17 μs    ±45.29%       68.61 μs      121.52 μs
Jason         6.09 K      164.28 μs    ±16.88%      156.68 μs      288.11 μs
Poison        4.56 K      219.20 μs    ±16.17%      204.66 μs      346.56 μs
JSX           3.95 K      253.37 μs    ±14.08%      241.00 μs      376.08 μs
JSON          3.15 K      317.15 μs    ±27.12%      297.77 μs      569.24 μs
jsone         3.01 K      332.44 μs    ±17.28%      321.25 μs      701.15 μs
Tiny          2.10 K      475.99 μs    ±14.16%      473.84 μs      622.28 μs

Comparison: 
jiffy        13.86 K
Jason         6.09 K - 2.28x slower +92.11 μs
Poison        4.56 K - 3.04x slower +147.03 μs
JSX           3.95 K - 3.51x slower +181.20 μs
JSON          3.15 K - 4.39x slower +244.98 μs
jsone         3.01 K - 4.61x slower +260.27 μs
Tiny          2.10 K - 6.60x slower +403.82 μs

This is done an a laptop. That's why there are big differences between the runs. It is visible that the PR has a slightly negative impact on performance though.

zuiderkwast commented 2 years ago

Note: I did not run rebar3 efmt -w because it causes very many changes, also to code that I didn't touch. It just makes it harder to review. I can do it in a separate commit later.

sile commented 2 years ago

Thank you for sharing the benchmark result! It's interesting.

The case where I'm not sure is when the input is a number split between the digits. In this case, an incomplete input is not an error. E.g. <<"1">>, <<".23">>, <<"e45">>.

You're right. It could be a difficult point.

I think that the benchmark result is not too bad, but this change certainly seems to have a negative impact on the decoding performance. So, I'd like to consider the possibility of the above approach further. (It is undecided whether to do it, but I would like to optimize it so that jsone will be faster someday. Therefore, if possible, I want to avoid performance degradation as much as possible.)

sile commented 2 years ago

This is also just an idea, but it might be possible to retry the number decoding as the following:

%% in jsone.erl file (the logic could be complicated, so it feels better to create a new module such as jsone_stream.erl, btw)
try_decode_stream(Json, Options) ->
  case jsone_decode:decode(Json, Options) of
    {ok, Value, Remainings} ->
      {ok, Value, Remainings};
    {error, {badarg, [{jsone_decode, array_next, Args = [<<>>, Values, Nexts, Buf, Opt]}]}} ->
      case Nexts of
          %% If the head element of `Nexts` is a number, retry the number decoding when the next stream input is given.
          [N | Nexts1] when is_number(N) ->
              incomplete(fun jsone_decode:number_integer_part, [jsone:encode(N), Values, Nexts1, Buf Opt]);
          _ ->
              incomplete(fun jsone_decode:array_next/5, Args)
       end;
    %% ... other clauses ...
    {error, Reason} ->
      {error, Reason}
  end.