jddurand / c-marpaESLIF

Extended perl's Marpa::R2 SLIF grammar writen in C
MIT License
6 stars 3 forks source link

ESLIF time is no-linear as a function of input size. #9

Closed jeffreykegler closed 2 years ago

jeffreykegler commented 3 years ago

After a couple of benchmarks, it seems that there is a non-linearity in the speed of the ESLIF built-in JSON parser. In the benchmarks included, note the relative speeds of SLIF JSON, JSON::PP and JSON::XS remain roughly the same, while the performance of ESLIF JSON plummets from better than JSON::PP to worse than the SLIF JSON.

Small file benchmark: https://gist.github.com/jeffreykegler/cdf4f8eceb07fcedb9d5dc52bd69a56a

Large file benchmark: https://gist.github.com/jeffreykegler/895e4136578ec9563468e6c103d1b9ae

jddurand commented 3 years ago

Yes, and this is not because of libmarpa - there is an internal method in ESLIF that is "flattening" all the pointers (it is called _marpaESLIF_flatten_pointer). I have to review this dependency. Thanks for the report.

jddurand commented 3 years ago

FYI current situation is now much better, here are my benchmarks on my PC (Pentium(R) Dual-Core CPU E5300 @ 2.60GHz):

Using /usr/share/iso-codes/json/iso_639-3.json
Using Marpa::R2 4
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
             s/iter  SLIF JSON ESLIF JSON   JSON::PP   JSON::XS
SLIF JSON      4.14         --       -61%       -63%      -100%
ESLIF JSON     1.61       157%         --        -4%       -99%
JSON::PP       1.55       167%         4%         --       -99%
JSON::XS   1.96e-02     20983%      8099%      7794%         --

And the benefit is obviously transversal, on the simple test.json it outperforms JSON:PP:

Using /home/jdurand/git/Marpa--R2/blog/json/test.json
Using Marpa::R2 4
            (warning: too few iterations for a reliable count)
             Rate  SLIF JSON   JSON::PP ESLIF JSON   JSON::XS
SLIF JSON  1.72/s         --       -56%       -70%      -100%
JSON::PP   3.96/s       130%         --       -31%       -99%
ESLIF JSON 5.71/s       231%        44%         --       -99%
JSON::XS    555/s     32064%     13902%      9605%         --
jeffreykegler commented 3 years ago

Even more impressive timings! But a lot of non-linearity remains. As the size increases, ESLIF JSON loses ground to all the others.

jddurand commented 3 years ago

You're right. My previous commits were not addressing JSON parser specifically, but a general inner method that decide what to free or not. It is why the fix was general, not specific to JSON. Next round of optimization will be the JSON parser itself ;)

jddurand commented 3 years ago

It is progressing, my latests measurements (just to say that is not really a question of input size, it has to do with containers i.e. JSON arrays and objects sizes):

Using /home/jdurand/git/Marpa--R2/blog/json/test.json
Using Marpa::R2 4
            (warning: too few iterations for a reliable count)
             Rate  SLIF JSON   JSON::PP ESLIF JSON   JSON::XS
SLIF JSON  1.72/s         --       -60%       -71%      -100%
JSON::PP   4.31/s       150%         --       -29%       -99%
ESLIF JSON 6.03/s       250%        40%         --       -99%
JSON::XS    540/s     31210%     12424%      8846%         --

Using /usr/share/iso-codes/json/iso_639-3.json
Using Marpa::R2 4
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
             s/iter  SLIF JSON   JSON::PP ESLIF JSON   JSON::XS
SLIF JSON      4.18         --       -64%       -64%      -100%
JSON::PP       1.52       175%         --        -0%       -99%
ESLIF JSON     1.52       175%         0%         --       -99%
JSON::XS   2.04e-02     20406%      7357%      7357%         --
jddurand commented 2 years ago

FYI here is a proof of concept in perl of another algorithm to parse JSON in ESLIF, bypassing valuation, and filling result on-the-fly using events: https://gist.github.com/jddurand/21d9e9877786c62514bb0abf6ff05561

jeffreykegler commented 2 years ago

I read through this. It looks like something I was thinking of attempting and I'm glad you did it. Is there a timing for it?

Also, doesn't the ESLIF allow actions to be written in Lua? Could a version of this be done with Lua actions?

jddurand commented 2 years ago

I'll do a POC with lua after I have finished the C implementation of revisited JSON implementation. The later is done and is currently in test phase.

jeffreykegler commented 2 years ago

Great! Thanks for all the hard work. I very much look forward to this.

jddurand commented 2 years ago

First tests are disappointing - seems that using events to trigger a logic is not a benefic v.s. doing standard valuation. I'm continuing the investigation. I have another idea on the main branch. To be ct'ed.

jddurand commented 2 years ago

Without an algorithm change, I am putting ESLIF even higher, by putting terminals in the interface. I still have to understand where I loose time in the algorithm and why it is not linear (though situation is better, a side-effect of the latest commit). so this is not finished.

Using /home/jdurand/git/Marpa--R2/blog/json/test.json
Using Marpa::R2 4
            (warning: too few iterations for a reliable count)
             Rate  SLIF JSON   JSON::PP ESLIF JSON   JSON::XS
SLIF JSON  1.72/s         --       -59%       -77%      -100%
JSON::PP   4.24/s       146%         --       -43%       -99%
ESLIF JSON 7.48/s       334%        76%         --       -99%
JSON::XS    545/s     31489%     12754%      7185%         --

Using /usr/share/iso-codes/json/iso_639-3.json
Using Marpa::R2 4
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
             s/iter  SLIF JSON   JSON::PP ESLIF JSON   JSON::XS
SLIF JSON      4.20         --       -64%       -70%      -100%
JSON::PP       1.52       176%         --       -18%       -99%
ESLIF JSON     1.24       239%        23%         --       -98%
JSON::XS   2.02e-02     20714%      7433%      6045%         --
jddurand commented 2 years ago

Using regular expression callbacks instead of grammar events, not suprising, it is better again, the shift between small and large file seems to have decreased:

Using /home/jdurand/git/Marpa--R2/blog/json/test.json
Using Marpa::R2 4
            (warning: too few iterations for a reliable count)
             Rate  SLIF JSON   JSON::PP ESLIF JSON   JSON::XS
SLIF JSON  1.69/s         --       -60%       -87%      -100%
JSON::PP   4.27/s       152%         --       -66%       -99%
ESLIF JSON 12.6/s       645%       195%         --       -98%
JSON::XS    555/s     32618%     12876%      4294%         --

Using /usr/share/iso-codes/json/iso_639-3.json
Using Marpa::R2 4
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
             s/iter  SLIF JSON   JSON::PP ESLIF JSON   JSON::XS
SLIF JSON      4.17         --       -63%       -85%      -100%
JSON::PP       1.56       167%         --       -59%       -99%
ESLIF JSON    0.635       557%       146%         --       -97%
JSON::XS   1.96e-02     21136%      7844%      3134%         --
jddurand commented 2 years ago

@jeffreykegler you might want to try with MarpaX-ESLIF-6.0.1-TRIAL . Almost good, at least performance has very much increased. I broke backward compatibility, and will fix that in the next officlal release ;)

jeffreykegler commented 2 years ago

I'd like to look at the change to regular expressions in the code. Where can I find it?

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Sunday, December 5th, 2021 at 4:49 AM, jddurand @.***> wrote:

Using regular expression callbacks instead of grammar events, not suprising, it is better again, the shift between small and large file seems to have decreased:

Using /home/jdurand/git/Marpa--R2/blog/json/test.json Using Marpa::R2 4 (warning: too few iterations for a reliable count) Rate SLIF JSON JSON::PP ESLIF JSON JSON::XS SLIF JSON 1.69/s -- -60% -87% -100% JSON::PP 4.27/s 152% -- -66% -99% ESLIF JSON 12.6/s 645% 195% -- -98% JSON::XS 555/s 32618% 12876% 4294% --

Using /usr/share/iso-codes/json/iso_639-3.json Using Marpa::R2 4 (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) (warning: too few iterations for a reliable count) s/iter SLIF JSON JSON::PP ESLIF JSON JSON::XS SLIF JSON 4.17 -- -63% -85% -100% JSON::PP 1.56 167% -- -59% -99% ESLIF JSON 0.635 557% 146% -- -97% JSON::XS 1.96e-02 21136% 7844% 3134% --

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

jddurand commented 2 years ago

Note that I can also do full external lexing... In another words it is how JSON::XS is working ;) I'd use marpa to validate the found token (I guess that asking for expected terminals is not that useful when a grammar is unambiguous)

jddurand commented 2 years ago

This is at https://github.com/jddurand/c-marpaESLIF/blob/master/src/json_decode_strict_grammar.c . It is quite like "(?{ code })" perl feature except that it is provided by PCRE2 callouts and is not experimental. A very powerful thingy.

jddurand commented 2 years ago

Let me do a ruby slipper technique - I am quite confident this cannot be anything but helpful and/or interesting

jeffreykegler commented 2 years ago

JSON is in a sense a bad choice for doing timings, because it is a simple grammar designed specifically to be parsed by weaker but faster parsers.

But it is well-defined and well-known and we have two excellent parsers we can use for comparison metrics: JSON::XS and JSON::PP.

We won't match JSON::XS, and it's inclusion in benchmarks from my POV is just as a measuring stick. Anyone who wants to 1) limit their grammar to LL(1); and 2) incur the development and maintenance cost of hand-written C code; is going to beat Marpa in terms of speed. That's fine, because that's a very restricted subset of the "parsing market".

More interesting are comparisons with JSON::PP. With JSON::PP you still have 1) -- you have to limit the grammar. But JSON::PP is in a scripting language, and the development and maintenance cost is much more widely acceptable.

However, ESLIF-based solutions have even lower development and maintenance costs than hand-written recursive descent written in Perl, and the ESLIF is far more powerful. If we can approach JSON::PP in speed, this is very good for us. If we can actually beat JSON::PP, I think much of the "parsing market" will find this extremely impressive indeed.

I see a good benchmarking as including SLIF, JSON::PP, JSON::XS, and the ESLIF built-in parser as comparison metrics. (IIRC the built-in ESLIF JSON parser uses techniques not available to ordinary users of ESLIF. Is that right?) Against these metrics I think it might be interesting to set a Lua-driven ESLIF parser, a Perl-driven ESLIF parser, and whatever other variants you think pose good combinations of fast and easy-to-write.

jddurand commented 2 years ago

IIRC the built-in ESLIF JSON parser uses techniques not available to ordinary users of ESLIF. Is that right?

Everything is available in perl userspace. The only thing that is not good with my latest technique is that I propage the full "subject" pattern to Perl (and other binded languages) which makes a big cost that does not exist in C. I will have to think to that (I am thinking to an opaque pointer on which the "" (perl), toString (java), __tostring (lua) methods are overwriten).

jddurand commented 2 years ago

Back to your previous comment I agree. At most I'd probably use (E)SLIF to have convenient :discard and marpa to validate that a token is valid. I am not far from doing a pure C version in fact, then it will not be marpa oriented though ;) I'm still interested to see what happens with the next implementation that will be:

This will still be marpa oriented and very probably again much efficient, because of the first two steps that both Marpa::R[1-3] and ESLIF cannot (or currently do not) do: branch prediction based on current character.

And at the end, for the fun, maybe, a pure C version... who knows. The fact that ESLIF has a general model for exporting data to any language makes that interesting.

jddurand commented 2 years ago

The implementation with marpa used to only validate the tokens has very promising benchmarks... stay tuned.

jddurand commented 2 years ago
Using /home/jdurand/git/Marpa--R2/blog/json/test.json
Using Marpa::R2 4
            (warning: too few iterations for a reliable count)
             Rate  SLIF JSON   JSON::PP ESLIF JSON   JSON::XS
SLIF JSON  1.72/s         --       -60%       -89%      -100%
JSON::PP   4.27/s       148%         --       -73%       -99%
ESLIF JSON 15.8/s       819%       271%         --       -97%
JSON::XS    565/s     32659%     13117%      3465%         --

Using /usr/share/iso-codes/json/iso_639-3.json
Using Marpa::R2 4
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
             s/iter  SLIF JSON   JSON::PP ESLIF JSON   JSON::XS
SLIF JSON      4.19         --       -63%       -88%      -100%
JSON::PP       1.53       174%         --       -67%       -99%
ESLIF JSON    0.510       722%       200%         --       -96%
JSON::XS   1.91e-02     21848%      7914%      2571%         --

The numbers are when marpa is used only to validate the input (lexeme read) with a recognizer, nothing else. For the record here are some profilings of the decode /usr/share/iso-codes/json/iso_639-3.json. You have to be aware the whole ESLIF.so library contains the marpa objects, PCRE2 objects, iconv objects, marpaWrapper objects, and ESLIF itself.

We see that:

ESLIF so global marpa w marpaESLIF c

jeffreykegler commented 2 years ago

Very nice looking. Could you give me a link to the ESLIF script?

The Marpa analysis all looks reasonable. bv_scan() (Bit Vector SCAN) is the workhorse routine within Libmarpa. For space reasons, use of bit vectors is essential and much of the work is done using them. The other routines create postdot items, earley items, and the earlemes that contain them. So no obvious waste.

jddurand commented 2 years ago

First let me release that on CPAN ;) The script is yours. Here is how I did that:

valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes --collect-atstart=no --toggle-collect=marpaESLIFJSON_decodeb perl -I blib/lib -I blib/arch -I ~/git/Marpa--R2/blog/json ~/git/Marpa--R2/blog/json/bench.pl /usr/share/iso-codes/json/iso_639-3.json

where I commented all tests but ESLIF.

jeffreykegler commented 2 years ago

By "the script" I meant the ESLIF script doing the parsing. We've gone thru so many changes so rapidly I lose track of exactly what the "JSON ESLIF" row means in each benchmark, that is, whether we are using Ruby slipper, regexes, events, etc etc etc. The actual script lets me be 100% clear on that part of what's going on.

jeffreykegler commented 2 years ago

Or is https://github.com/jddurand/c-marpaESLIF/blob/master/src/json_decode_strict_grammar.c still the code being run?

jddurand commented 2 years ago

Ah ok... the latest benchmarks are are combinaison of

C.f. https://github.com/jddurand/c-marpaESLIF/blob/master/src/json.c#L412

The JSON strict and grammar differ only on :discard and "," separator setting:

(You will maybe notice that I invented the notion of aliasing a terminal to a name, to be able to inject terminals on-demand as if it they were lexemes)

jeffreykegler commented 2 years ago

The "built-in" C-language-enhanced benchmark is of great interest but (as my last on the IRC channel hints) I am especially interested in the comparision of pure-ESLIF (no use of the built-in or of custom C language) vs JSON::PP. For someone looking to do a quick DSL, that is the most pertinent benchmark.

When you start introducing custom written C, the results are still interesting, but in terms for practical choices for a parser, the comparison becomes much more complex.

jddurand commented 2 years ago

Ok... so I'll do a pure perl or lua thingy. Everything is externalized and binded to these languages.

jddurand commented 2 years ago

FYI I am releasing MarpaX-ESLIF-6.0.5-TRIAL, with which I should be able to the pure Perl version (derived in two things: Perl callbacks and Lua callbacks).

jddurand commented 2 years ago

After optimizing a lot every method, to reduce the noise I have identified, or rather confirmed, the culprit in ESLIF. The issue is probably a mix of algorithmic/technic: image

jeffreykegler commented 2 years ago

Nice progress!

jddurand commented 2 years ago

I guess this is almost fixed with the coming 6.0.6 version. I also made a lot of effort on optimizing binded languages callbacks during the recognizer phase, to avoid unnecessary internal copy of data. Perl and Lua bindings have changed significantly (I did not touch Java bindings).

In theory I can now mimic the optimized json done in C with Perl and Lua recognizer callbacks :)

Using /home/jdurand/git/Marpa--R2/blog/json/test.json
Using Marpa::R2 4
            (warning: too few iterations for a reliable count)
             Rate  SLIF JSON   JSON::PP ESLIF JSON   JSON::XS
SLIF JSON  1.69/s         --       -58%       -90%      -100%
JSON::PP   4.00/s       136%         --       -76%       -99%
ESLIF JSON 16.3/s       864%       309%         --       -97%
JSON::XS    555/s     32618%     13764%      3293%         --

Using /usr/share/iso-codes/json/iso_639-3.json
Using Marpa::R2 4
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
             s/iter  SLIF JSON   JSON::PP ESLIF JSON   JSON::XS
SLIF JSON      4.19         --       -63%       -88%      -100%
JSON::PP       1.56       169%         --       -68%       -99%
ESLIF JSON    0.493       749%       216%         --       -96%
JSON::XS   1.95e-02     21427%      7915%      2435%         --
jeffreykegler commented 2 years ago

Really, really close! It might be useful to add a 3rd test. That way we could see the whether we have a difference in start-up cost (which can be OK) or a non-linearity in the asymptote, which should be able to be eliminated.

jddurand commented 2 years ago

I guess this is almost fixed with the coming 6.0.6 version. I also made a lot of effort on optimizing binded languages callbacks during the recognizer phase, to avoid unnecessary internal copy of data. Perl and Lua bindings have changed significantly (I did not touch Java bindings).

In theory I can now mimic the optimized json done in C with Perl and Lua recognizer callbacks :)

Using /home/jdurand/git/Marpa--R2/blog/json/test.json
Using Marpa::R2 4
            (warning: too few iterations for a reliable count)
             Rate  SLIF JSON   JSON::PP ESLIF JSON   JSON::XS
SLIF JSON  1.69/s         --       -58%       -90%      -100%
JSON::PP   4.00/s       136%         --       -76%       -99%
ESLIF JSON 16.3/s       864%       309%         --       -97%
JSON::XS    555/s     32618%     13764%      3293%         --

Using /usr/share/iso-codes/json/iso_639-3.json
Using Marpa::R2 4
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
             s/iter  SLIF JSON   JSON::PP ESLIF JSON   JSON::XS
SLIF JSON      4.19         --       -63%       -88%      -100%
JSON::PP       1.56       169%         --       -68%       -99%
ESLIF JSON    0.493       749%       216%         --       -96%
JSON::XS   1.95e-02     21427%      7915%      2435%         --
jddurand commented 2 years ago

I have release version 6.0.10 and is going to pull to your repo two other version of ESLIF's JSON:

Benchmarks are then:

Using /home/jdurand/git/Marpa--R2/blog/json/test.json
Using Marpa::R2 4
                           Rate SLIF JSON ESLIF JSON PP JSON::PP ESLIF JSON PP (minimal) ESLIF JSON JSON::XS
SLIF JSON                 177/s        --          -29%     -59%                    -74%       -90%    -100%
ESLIF JSON PP             249/s       41%            --     -43%                    -63%       -86%     -99%
JSON::PP                  436/s      146%           75%       --                    -35%       -75%     -99%
ESLIF JSON PP (minimal)   672/s      279%          170%      54%                      --       -62%     -99%
ESLIF JSON               1777/s      903%          613%     307%                    164%         --     -96%
JSON::XS                47287/s    26594%        18883%   10737%                   6936%      2561%       --

Using /usr/share/iso-codes/json/iso_639-3.json
Using Marpa::R2 4
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
                          s/iter ESLIF JSON PP SLIF JSON ESLIF JSON PP (minimal) JSON::PP ESLIF JSON JSON::XS
ESLIF JSON PP               5.21            --      -19%                    -65%     -69%       -91%     -99%
SLIF JSON                   4.21           24%        --                    -57%     -62%       -88%     -99%
ESLIF JSON PP (minimal)     1.80          189%      134%                      --     -11%       -73%     -98%
JSON::PP                    1.60          226%      163%                     12%       --       -69%     -98%
ESLIF JSON                 0.493          956%      753%                    265%     224%         --     -94%
JSON::XS                2.78e-02        18675%    15071%                   6386%    5666%      1678%       --

(yes, ESLIF is not stricly linear, still, though as you said the situation is much much better).

Many thanks, you forced me to improve very much ESLIF performance, and it now reaches quite good timings.

jeffreykegler commented 2 years ago

Nice work! I'm looking forward to the Lua timings. Hopefully Lua callbacks have less overhead.

jddurand commented 2 years ago

I have release version 6.0.11 and is going to pull to your repo another version of ESLIF's JSON that is using Lua callbacks.

I also fixed your test suite: the ESLIF multiton was recreated every time.

Benchmarks are suprisingly not good with Lua... So I digged into profiling that and, although the workflow between ESLIF and Lua is good IMHO, this is because the Lua bindings themselves are not that good. I am using Lua globals too much and temporary tables too many time. I will work on improving that...

Benchmarks:

perl bench.pl test.json
Using test.json
Using Marpa::R2 4
Using MarpaX::ESLIF 6.0.11
                            Rate ESLIF JSON Lua SLIF JSON ESLIF JSON PP JSON::PP ESLIF JSON Lua (minimal) ESLIF JSON PP (minimal) ESLIF JSON JSON::XS
ESLIF JSON Lua             181/s             --        0%          -29%     -63%                     -68%                    -73%       -90%    -100%
SLIF JSON                  181/s             0%        --          -29%     -63%                     -68%                    -73%       -90%    -100%
ESLIF JSON PP              256/s            42%       42%            --     -47%                     -55%                    -62%       -86%    -100%
JSON::PP                   482/s           167%      167%           88%       --                     -15%                    -29%       -74%     -99%
ESLIF JSON Lua (minimal)   565/s           213%      213%          121%      17%                       --                    -17%       -69%     -99%
ESLIF JSON PP (minimal)    678/s           275%      275%          165%      41%                      20%                      --       -63%     -99%
ESLIF JSON                1829/s           912%      912%          615%     279%                     224%                    170%         --     -97%
JSON::XS                 55138/s         30408%    30408%        21442%   11334%                    9662%                   8030%      2914%       --

perl bench.pl /usr/share/iso-codes/json/iso_639-3.json
Using /usr/share/iso-codes/json/iso_639-3.json
Using Marpa::R2 4
Using MarpaX::ESLIF 6.0.11
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
                           s/iter ESLIF JSON Lua ESLIF JSON PP SLIF JSON ESLIF JSON Lua (minimal) ESLIF JSON PP (minimal) JSON::PP ESLIF JSON JSON::XS
ESLIF JSON Lua               8.46             --          -38%      -51%                     -75%                    -80%     -80%       -94%    -100%
ESLIF JSON PP                5.25            61%            --      -22%                     -59%                    -67%     -68%       -91%     -99%
SLIF JSON                    4.12           105%           27%        --                     -48%                    -58%     -59%       -88%     -99%
ESLIF JSON Lua (minimal)     2.15           293%          144%       92%                       --                    -20%     -22%       -77%     -99%
ESLIF JSON PP (minimal)      1.72           392%          205%      140%                      25%                      --      -3%       -72%     -98%
JSON::PP                     1.67           407%          214%      147%                      29%                      3%       --       -71%     -98%
ESLIF JSON                  0.487          1638%          979%      747%                     342%                    253%     243%         --     -94%
JSON::XS                 2.68e-02         31433%        19468%    15256%                    7914%                   6311%    6125%      1714%       --
jeffreykegler commented 2 years ago

Nice progress!

Re Lua bindings, I made my own attempt at these in Kollos: jeffreykegler/kollos . Maybe there are some useful ideas there.

jddurand commented 2 years ago

I have release MarpaX-ESLIF 6.0.13 where Lua bindings improved by:

Still Lua bindings are not as performant as I would like, and I do not know yet how to improve that. I am thinking to LuaJIT, eventually.

Benchmarks are now:

perl -I blib/lib -I blib/arch -I ~/git/Marpa--R2/blog/json ~/git/Marpa--R2/blog/json/bench.pl ~/git/Marpa--R2/blog/json/test.json
Using /home/jdurand/git/Marpa--R2/blog/json/test.json
Using Marpa::R2 4
Using MarpaX::ESLIF 6.0.13
                            Rate SLIF JSON ESLIF JSON Lua ESLIF JSON PP JSON::PP ESLIF JSON Lua (minimal) ESLIF JSON PP (minimal) ESLIF JSON JSON::XS
SLIF JSON                  177/s        --           -14%          -33%     -64%                     -71%                    -74%       -90%    -100%
ESLIF JSON Lua             206/s       16%             --          -23%     -58%                     -67%                    -70%       -89%    -100%
ESLIF JSON PP              266/s       50%            29%            --     -45%                     -57%                    -62%       -85%    -100%
JSON::PP                   487/s      175%           137%           83%       --                     -22%                    -30%       -73%     -99%
ESLIF JSON Lua (minimal)   621/s      251%           202%          134%      28%                       --                    -10%       -65%     -99%
ESLIF JSON PP (minimal)    691/s      290%           236%          160%      42%                      11%                      --       -62%     -99%
ESLIF JSON                1794/s      913%           773%          575%     269%                     189%                    160%         --     -97%
JSON::XS                 54612/s    30730%         26475%        20453%   11119%                    8690%                   7806%      2944%       --

perl -I blib/lib -I blib/arch -I ~/git/Marpa--R2/blog/json ~/git/Marpa--R2/blog/json/bench.pl /usr/share/iso-codes/json/iso_639-3.json
Using /usr/share/iso-codes/json/iso_639-3.json
Using Marpa::R2 4
Using MarpaX::ESLIF 6.0.13
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
            (warning: too few iterations for a reliable count)
                           s/iter ESLIF JSON Lua ESLIF JSON PP SLIF JSON ESLIF JSON Lua (minimal) JSON::PP ESLIF JSON PP (minimal) ESLIF JSON JSON::XS
ESLIF JSON Lua               7.31             --          -29%      -43%                     -74%     -76%                    -77%       -93%    -100%
ESLIF JSON PP                5.17            41%            --      -19%                     -63%     -66%                    -67%       -91%     -99%
SLIF JSON                    4.17            75%           24%        --                     -54%     -58%                    -59%       -88%     -99%
ESLIF JSON Lua (minimal)     1.92           281%          169%      117%                       --      -9%                    -11%       -74%     -99%
JSON::PP                     1.75           318%          195%      138%                      10%       --                     -3%       -72%     -99%
ESLIF JSON PP (minimal)      1.70           330%          204%      145%                      13%       3%                      --       -71%     -98%
ESLIF JSON                  0.490          1392%          955%      751%                     292%     257%                    247%         --     -95%
JSON::XS                 2.61e-02         27910%        19710%    15879%                    7257%    6606%                   6414%      1778%       --
jddurand commented 2 years ago

Side remark: none of the ESLIF JSON PP or Lua implementation uses the same technique as builtin ESLIF JSON. They are using grammar parsing and valuation, when pure ESLIF JSON uses a rubby slipper technique in C, the later can be writen in perl and Lua languages as well. I might give that a try ;)

jeffreykegler commented 2 years ago

Wonderful work! Am I correct in saying that ESLIF JSON and JSON::XS are the two which require custom C code, and that none of the others do? I treat this as an important distinction, because for someone implementing a custom DSL, writing special C code is too much to ask, and I am thinking of that kind of user.

By "minimal", I take it is meant that native ESLIF is used wherever feasible. If that understanding is correct, I think the minimal are the more interesting benchmarks for the DSL writers, because IMHO it is reasonable to ask them to research the ESLIF enough to know when they can use native ESLIF semantics. This especially because they can do it gradually, converting to native ESLIF as they find performance annoying and discover the ESLIF facilities.

If I read these right, this means a DSL author using native ESLIF and Perl semantics can beat JSON::PP. Is that right? If so, that's a real triumph, and a fitting reward for your tireless efforts.

jeffreykegler commented 2 years ago

Re the Lua, yes, disappointing relative to the Perl. Thanks for following up on this.

if you pursue this, instead of LuaJIT, I would suggest Pallene. LuaJIT is problematic in several respects. It is not supported by Roberto's group, requires specific hardware, and has many, often subtle, incompatibilities with Lua. For DSL-writers, the hassle introduced by one silent, subtle bug introduced by an incompatibility is likely to outweigh even major speed advantages.

LuaJIT, also, de facto, is for a Lua subset. That is, there are some Lua features it cannot deal with, and if LuaJIT sees them, it does not attempt to optimize. The LuaJIT subset if not well-documented and one of the things a LuaJIT programmer does is learn to recognize the Lua language subset they need to use. Pallene, on the other hand, is a documented Lua subset, which is guaranteed to optimize.

Anyway, if the ESLIF JSON PP (minimal) results are what I think they are, it more than makes up for the Lua disappointment.

jddurand commented 2 years ago

Many thanks for your comments. All that you said is correct. I will look into Pallene, did not know about it! The only thing I want when interfacing to a language is that its API is thread-safe.

jeffreykegler commented 2 years ago

Perhaps the best start for Pallene is the original paper.

jeffreykegler commented 2 years ago

Some thoughts:

1.) Lua was a hope for a quick gain in performance. Obviously that didn't happen. Perhaps it should go to a back burner. For the typical DSL author, it would be better, in fact, if PP were the best option. It means all they have to learn is ESLIF and Perl.

2.) The non-linearity is still there, despite big improvements in performance. One approach to tackling it might be to measure memory use, to see if it is also non-linear. That might help pinpoint the problem in speed and, in any case, a memory use non-linearity itself would be a problem.

3.) In difficult cases like this, I have resorted to logging time-stamps. An after the fact examination can then find the CPU consumed between any two points, and the log can be analysed to find pinpoint the location of all non-linearities. Caution has to be exercised to ensure the resources consumed in logging are not included in the analysis.

Of most interest to the DSL author, I believe, is the PP-minimal number. However problem might be easier to spot in the PP-non-minimal version. And, of course, the rest are useful as "pace setters".

Thanks for you work on this!

jddurand commented 2 years ago

Funnily the non-linearity is in all versions but "ESLIF JSON", that is using another technique (a 100% rubby slipper, tokenization is totally externalized). I believe it has to do with how my scanning is implemented when searching for all alternatives.

jddurand commented 2 years ago

For information, I compared again the instrumentation of a perl program doing the parse of the small test.json that is Marpa--R2 blog v.s. the parse of /usr/share/iso-codes/json/iso_639-3.json, concentrating on the "self" column (though take care it is compiled with -O2 -g so things gets inlined). Modulo the possible variations between two measurements, there seem to be a clear shift at marpa_b_new() - I was just wondering if this is expected (?)

image

jddurand commented 2 years ago

An interesting measurement is also the difference on the "last-level miss sum" measurement, that seems to indicate the marpa_b_new() accesses memory that cannot be cached: image

jeffreykegler commented 2 years ago

It is plausible that, with a call to marpa_b_new, there may be a lot of cache misses. The recognizer builds the parse left-to-right, so most accesses all along will be at the right'' end. The bocage is build top-down, left-to-right, so that, during that process there will be a lot ofleft end'' access, of items which are, in a long parse, no likely to be in cache.