vovkos / jancy

The first and only scripting language with safe pointer arithmetics, high level of ABI and source compatibility with C, spreadsheet-like reactive programming, built-in lexer generator, and more.
http://jancy.org
MIT License
61 stars 9 forks source link

Too much memory and time to execute exprtk tests #8

Closed mingodad closed 1 year ago

mingodad commented 1 year ago

When testing jancy with an adapted test from https://github.com/ArashPartow/exprtk then jancy takes too long (I stopped it) and too much memory (more than 10GB when I stopped it).

See the attached test, notice that there is several tests commented because this kind of error:

equal(0.0e+0,+0.0e+0);
...
jancy "exprtk_functional_test.jnc"
/home/mingo/dev/c/A_programming-languages/exprtk/exprtk_functional_test.jnc(7170,10): unexpected 'floating-point-constant' in 'expression_or_empty_list'
1 error(s); compilation failed

exprtk_functional_test.jnc.zip

vovkos commented 1 year ago

In Jancy, arrays types for parameters don't automatically convert into pointers; as such, auto-sized arrays like argv[] aren't allowed. Unfortunately, the compiler didn't catch that and essentially asked LLVM to generate code for a SIZE_MAX-sized array. Fixed in https://github.com/vovkos/jancy/commit/19eed73362f7dcaffa7a0ae2f693659a3123c50d

Another problem was that floating-point literals with an exponent (like 1e10) were not parsed correctly. This one is fixed in https://github.com/vovkos/jancy/commit/e84502634e3d0e0362dc347cf718aee8867beb87

mingodad commented 1 year ago

Thank you ! Comparing the output of /usr/bin/time it seems that jancy has trouble with this sample:

/usr/bin/time jancy exprtk_functional_test.jnc 
7364    7285
21.95user 0.26system 0:22.21elapsed 99%CPU (0avgtext+0avgdata 492128maxresident)k
...
./exprtk_functional_test
7364    7285

mingo@mingo-X550VX:~/dev/c/A_programming-languages/exprtk$ /usr/bin/time ./exprtk_functional_test #C
7364    7285
0.00user 0.00system 0:00.00elapsed ?%CPU (0avgtext+0avgdata 2604maxresident)k
...
/usr/bin/time lua exprtk_functional_test.txt.lua 
7437    7321
0.03user 0.00system 0:00.04elapsed 87%CPU (0avgtext+0avgdata 4412maxresident)k
1992inputs+0outputs (1major+572minor)pagefaults 0swaps
...
/usr/bin/time node exprtk_functional_test.txt.js 
7445 7299
0.09user 0.02system 0:00.12elapsed 97%CPU (0avgtext+0avgdata 58768maxresident)k
1376inputs+0outputs (0major+7453minor)pagefaults 0swaps
...
/usr/bin/time qjs exprtk_functional_test.txt.js 
7445 7299
0.07user 0.00system 0:00.07elapsed 100%CPU (0avgtext+0avgdata 6384maxresident)k
64inputs+0outputs (0major+1720minor)pagefaults 0swaps
...
/usr/bin/time squilu exprtk_functional_test.txt.nut 
7059    6987
0.04user 0.00system 0:00.04elapsed 91%CPU (0avgtext+0avgdata 11764maxresident)k
1368inputs+0outputs (0major+1505minor)pagefaults 0swaps

By trouble I mean memory/performance related scripting languages (Lua, Quiqjs, SquiLu) take less than 100ms and around 10MB of memory while jancy needs 21s and 490MB of memory, is this expected ?

vovkos commented 1 year ago

Dedicated some time to testing and profiling the issue; redesigned a few things to improve the situation both with compilation time and memory footprint. Also, as a byproduct of this effore, now there's support for LLVM ORC JIT on LLVM versions 12 and above.

TLDR:

Grab the latest Jancy build and run it like this:

./jancy -O2 -J0 exprtk_functional_test.jnc

More details.

First of all, the test itself is a long sequence of math expressions on constants. You can calculate all of those with a single parsing pass -- interpreters, even primitive ones, should perform extremely well on this test, better than compilers, actually. Therefore, comparing the performance of Jancy vs that of pure interpreters is not really fair. Now, when comparison with compilers -- you have to measure the compilation time, not the running time. If an optimizing compiler is capable of constant folding and inlining, then the final result of its handywork should be a single call to printf with a value calculated at compile time. So compiling a C# or C/C++ program and then measuring the run time is not fair either -- basically, you are measuring the time of loading the program and executing a single printf.

Now, a fair comparison with Jancy would be calculating compilation time, like:

clang exprtk_functional_test.cpp`
dmd exprtk_functional_test.d`
etc

Actually, to get even closer to the real sequence of steps Jancy is doing, you would want to first generate LLVM IR, and then run LLVM JIT on that:

clang -S -emit-llvm exprtk_functional_test.cpp
lli exprtk_functional_test.ll

Interestingly, if you measure this, you will notice that the results are much worse than when compiling a real executable. At first I was like, what?! Well, as it turns out, the reason hides in the specifics of the LLVM JIT code generator (more precisely, the way it configures its codegen optimizer pipeline). I decided to try and see if the newer LLVM ORC JIT is doing any better than tried-and-true MCJIT (it looks like ORC is under heavy development and the LLVM team is putting a lot of effort here). Alas, it's more or less the same performance wise -- with the unfortunate exception of ORC JIT not working on Windows. Nevertheless, Jancy supports ORC JIT now -- you can try it by passing --orc via command line. However, I also added an option to completely disable the codegen optimizer in JIT, and that's what allowed achieving acceptable performance on the test (-O2 -J0 means, run a standard optimizer pipeline on LLVM IR and turn off the codegen optimizer in JIT).

Regarding the memory footprint. I optimized a few things and while it's still higher than that of other compilers/interpreters, I believe it's acceptable now (~130M). Alas, there's no quick fix for this due to the multi-pass specifics of Jancy parser/compiler pipeline (first, it tokenizes the function, then it parses the syntax tree and emits LLVM IR -- the pre-tokenized function source is what's eating memory here). It's possible to redesign this part, too, but I don't think this is an important optimization to focus on right now. It only hits you when there's a HUGE function consisting of tens of thousands of tokens, like in this test. It it's the same file size but many small functions, the should be no memory bloat.