Open lygstate opened 3 years ago
With the following cmake configure options: ["-DJERRY_LINE_INFO=ON", "-DJERRY_GLOBAL_HEAP_SIZE=102400", "-DJERRY_GC_MARK_LIMIT=8"] The tests are running, but the SplayTree benchmark test can not running, it's too slow: JerryScript
PROGRESS Richards
RESULT Richards 205
PROGRESS DeltaBlue
RESULT DeltaBlue 179
PROGRESS Encrypt
PROGRESS Decrypt
RESULT Crypto 250
PROGRESS RayTrace
RESULT RayTrace 349
PROGRESS Earley
PROGRESS Boyer
RESULT EarleyBoyer 141
PROGRESS RegExp
RESULT RegExp 19.6
PROGRESS NavierStokes
RESULT NavierStokes 544
SCORE 174
QuickJs:
PROGRESS Richards
RESULT Richards 868
PROGRESS DeltaBlue
RESULT DeltaBlue 872
PROGRESS Encrypt
PROGRESS Decrypt
RESULT Crypto 1014
PROGRESS RayTrace
RESULT RayTrace 1144
PROGRESS Earley
PROGRESS Boyer
RESULT EarleyBoyer 1789
PROGRESS RegExp
RESULT RegExp 244
PROGRESS NavierStokes
RESULT NavierStokes 1681
SCORE 940
NodeJS jitless
PROGRESS Richards
RESULT Richards 1139
PROGRESS DeltaBlue
RESULT DeltaBlue 1300
PROGRESS Encrypt
PROGRESS Decrypt
RESULT Crypto 898
PROGRESS RayTrace
RESULT RayTrace 2751
PROGRESS Earley
PROGRESS Boyer
RESULT EarleyBoyer 4831
PROGRESS RegExp
RESULT RegExp 3387
PROGRESS NavierStokes
RESULT NavierStokes 1599
SCORE 1919
NodeJs
PROGRESS Richards
RESULT Richards 38059
PROGRESS DeltaBlue
RESULT DeltaBlue 62548
PROGRESS Encrypt
PROGRESS Decrypt
RESULT Crypto 45647
PROGRESS RayTrace
RESULT RayTrace 51651
PROGRESS Earley
PROGRESS Boyer
RESULT EarleyBoyer 56458
PROGRESS RegExp
RESULT RegExp 8929
PROGRESS NavierStokes
RESULT NavierStokes 51198
SCORE 39303
First of all, I highly suggest to turn JERRY_LINE_INFO
off since the source info decoding during the bytecode execution causes serious slowdown.
My two concerns: 1, I don't see the results of Ducktape. 2, This comparison in itself does not tell anything.
Comparing a low-end engine with a high-end engine (V8) is quite unfair. Both engines have a different design pattern. In high end engines the main aspect is the performance which has significant memory/stack usage and enormous binary size. JerryScript was designed to low-end devices with restricted resources so the main priority is the low memory usage and the small binary footprint. However, keeping these numbers low work against the performance.
So I suggest to compare only the low-end engines. What I'd call a fair and detailed comparison is:
Engine - Score | Engine - Peak Memory Consumption (KB) | Engine - Peak Stack Usage (KB) | |
---|---|---|---|
Richards | |||
DeltaBlue | |||
... |
Where Engine is one of Jerry
, Ducktape
or QuickJS
.
Also good to mention the supported revision of the standard by the tested engines, since the new language elements above ES6+ are real challenges to the developers to support them without serious engine slowdown.
Binary size (KB) | Standard compatibly | |
---|---|---|
Jerry | ||
Ducktape | ||
QuickJS |
JERRY_LINE_INFO
JERRY_LINE_INFO affect little, and Splay are toooo slow, this should be a bug.
Let's get just a wee bit more professional. "too** slow" is not anything helpful. Same goes for "this should be a bug." Analysis is welcome.
Splay is a specific test for GC. It was designed to test the evolutional GC. Since JerryScript uses single mark&sweep model it wouldn't perform well in this test. However increasing the JERRY_GC_MARK_LIMIT
can help this problem but it will also increase the stack usage.
Splay is a specific test for GC. It was designed to test the evolutional GC. Since JerryScript uses single mark&sweep model it wouldn't perform well in this test. However increasing the
JERRY_GC_MARK_LIMIT
can help this problem but it will also increase the stack usage.
What JERRY_GC_MARK_LIMIT
value suggest to benchmark it?
I have no optional number to say. Increasing the recursion limit will increase the score and stack usage simultaneously. So keep fine tuning it, I suggest to start doubling it continuously.
@rerobika For the record, increased to 1024 have on effect
lygstate@DESKTOP-94PU0GB:/mnt/c/work/study/languages/typescript/jerryscript/build/linux$ cmake -GNinja ../.. -DJERRY_LINE_INFO=OFF -DCMAKE_BUILD_TYPE=Release -DJERRY_EXTERNAL_CONTEXT=OFF -DJERRY_SYSTEM_ALLOCATOR=OFF -DJERRY_GLOBAL_HEAP_SIZE=512000 -DJERRY_GC_MARK_LIMIT=1024
-- The C compiler identification is GNU 9.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- CMAKE_BUILD_TYPE Release
-- CMAKE_C_COMPILER_ID GNU
-- CMAKE_SYSTEM_NAME Linux
-- CMAKE_SYSTEM_PROCESSOR x86_64
-- BUILD_SHARED_LIBS OFF
-- ENABLE_LTO ON
-- ENABLE_STRIP ON
-- JERRY_VERSION 2.4.0
-- JERRY_CMDLINE ON
-- JERRY_CMDLINE_TEST OFF
-- JERRY_CMDLINE_SNAPSHOT OFF
-- JERRY_LIBFUZZER OFF (FORCED BY COMPILER)
-- JERRY_PORT_DEFAULT ON (FORCED BY CMDLINE OR LIBFUZZER OR TESTS)
-- JERRY_EXT ON (FORCED BY CMDLINE OR TESTS)
-- JERRY_LIBM ON
-- UNITTESTS OFF
-- DOCTESTS OFF
-- ENABLE_ALL_IN_ONE OFF
-- JERRY_CPOINTER_32_BIT ON (FORCED BY HEAP SIZE)
-- JERRY_DEBUGGER OFF
-- JERRY_ERROR_MESSAGES OFF
-- JERRY_EXTERNAL_CONTEXT OFF
-- JERRY_PARSER ON
-- JERRY_LINE_INFO OFF
-- JERRY_LOGGING OFF
-- JERRY_MEM_STATS OFF
-- JERRY_MEM_GC_BEFORE_EACH_ALLOC OFF
-- JERRY_PARSER_DUMP_BYTE_CODE OFF
-- JERRY_PROFILE es.next
-- JERRY_REGEXP_STRICT_MODE OFF
-- JERRY_REGEXP_DUMP_BYTE_CODE OFF
-- JERRY_SNAPSHOT_EXEC OFF
-- JERRY_SNAPSHOT_SAVE OFF
-- JERRY_SYSTEM_ALLOCATOR OFF
-- JERRY_VALGRIND OFF
-- JERRY_VM_EXEC_STOP OFF
-- JERRY_GLOBAL_HEAP_SIZE 512000
-- JERRY_GC_LIMIT (0)
-- JERRY_STACK_LIMIT (0)
-- JERRY_GC_MARK_LIMIT 1024
-- FEATURE_INIT_FINI OFF
-- Performing Test HAVE_TM_GMTOFF
-- Performing Test HAVE_TM_GMTOFF - Success
-- Looking for include file time.h
-- Looking for include file time.h - found
-- Looking for include file unistd.h
-- Looking for include file unistd.h - found
-- ENABLE_LINK_MAP OFF
-- JERRY_TEST_STACK_MEASURE OFF
-- Configuring done
-- Generating done
-- Build files have been written to: /mnt/c/work/study/languages/typescript/jerryscript/build/linux
lygstate@DESKTOP-94PU0GB:/mnt/c/work/study/languages/typescript/jerryscript/build/linux$ ninja
[244/244] Linking C executable bin/jerry
lygstate@DESKTOP-94PU0GB:/mnt/c/work/study/languages/typescript/jerryscript/build/linux$ ^C
lygstate@DESKTOP-94PU0GB:/mnt/c/work/study/languages/typescript/jerryscript/build/linux$ ^C
lygstate@DESKTOP-94PU0GB:/mnt/c/work/study/languages/typescript/jerryscript/build/linux$ ./bin/jerry ../../tests/benchmarks/v8/combined.js
PROGRESS Richards
RESULT Richards 262
PROGRESS DeltaBlue
RESULT DeltaBlue 211
PROGRESS Encrypt
PROGRESS Decrypt
RESULT Crypto 306
PROGRESS RayTrace
RESULT RayTrace 391
PROGRESS Earley
PROGRESS Boyer
RESULT EarleyBoyer 148
PROGRESS RegExp
RESULT RegExp 36.7
PROGRESS Splay
RESULT Splay 0.106
PROGRESS NavierStokes
RESULT NavierStokes 480
SCORE 80.8
Now found the cause of splay too slow
python tools/build.py ^
--clean ^
--lto=OFF ^
--jerry-debugger=ON ^
--jerry-cmdline=ON ^
--jerry-cmdline-snapshot=ON ^
--jerry-math=ON ^
--jerry-ext=ON ^
--amalgam=ON ^
--snapshot-exec=ON ^
--stack-limit=512 ^
--gc-mark-limit=64 ^
--mem-heap=2048 ^
--cpointer-32bit=ON ^
--system-allocator=ON ^
--external-context=ON ^
--regexp-strict-mode=ON ^
--js-parser=ON ^
--line-info=ON ^
--error-messages=ON ^
--logging=ON ^
--cmake-param=-GNinja ^
--cmake-param=-DJERRY_LCACHE=1 ^
--cmake-param=-DJERRY_PROPRETY_HASHMAP=1 ^
--profile=es.next
I guess mainly because of -DJERRY_LCACHE=1 and -DJERRY_PROPRETY_HASHMAP=1
benchmark result:
C:\work\study\languages\typescript\jerryscript>build\bin\jerry.exe tests\benchmarks\v8\combined.js
PROGRESS Richards
RESULT Richards 173
PROGRESS DeltaBlue
RESULT DeltaBlue 180
PROGRESS Encrypt
PROGRESS Decrypt
RESULT Crypto 182
PROGRESS RayTrace
RESULT RayTrace 291
PROGRESS Earley
PROGRESS Boyer
RESULT EarleyBoyer 408
PROGRESS RegExp
RESULT RegExp 155
PROGRESS Splay
RESULT Splay 365
PROGRESS NavierStokes
RESULT NavierStokes 398
SCORE 250
cc @dbatyai
I guess mainly because of -DJERRY_LCACHE=1 and -DJERRY_PROPRETY_HASHMAP=1 What does it mean exactly? If you consider this as the source of the slowness a with/without comparison would be great.
Moreover I've tested the engine on splay with lcache and hashmap and without them and there were no significant difference. But feel free to share how that you came to this conclusion.
As it was said before, the splay test was created specifically to test garbage collection, and thus uses a lot of memory and creates a lot of fragmentation. The fact that it is slow has nothing to do with the lcache, and very little with hashmaps (these can have a slight effect on fragmentation).
The reason this test is slow is that the jerry allocator was not designed to handle this much memory, and maintaining the free block list gets more and more costly as the memory get fragmented. I have a few ideas on how to improve the logic behind the allocator to make it less affected by fragmentation, but can't really give anything specific for now.
However the fact remains the same, the core idea behind the allocator will still be handling smaller amounts of memory with as little overhead as possible, and not high performance. When larger amounts of memory is required or performance is more critical then the system allocator should be used instead.
As it was said before, the splay test was created specifically to test garbage collection, and thus uses a lot of memory and creates a lot of fragmentation. The fact that it is slow has nothing to do with the lcache, and very little with hashmaps (these can have a slight effect on fragmentation).
The reason this test is slow is that the jerry allocator was not designed to handle this much memory, and maintaining the free block list gets more and more costly as the memory get fragmented. I have a few ideas on how to improve the logic behind the allocator to make it less affected by fragmentation, but can't really give anything specific for now.
However the fact remains the same, the core idea behind the allocator will still be handling smaller amounts of memory with as little overhead as possible, and not high performance. When larger amounts of memory is required or performance is more critical then the system allocator should be used instead.
Hi, verified, you are right, but the current situation is on 64bit processor, we can not using system allocator, that's why I have forced to using jerry memory allocator and leading to significant fragmentation
Sorry to bring up an old topic but lvgl switched memory allocators recently which gave a pretty big speed boost. Could have a look at what they used. I forgot the name of the library right now.
Refer to https://bellard.org/quickjs/bench.html
I am using the following branch to bench jerryscript
https://github.com/lygstate/jerryscript/tree/benchmark Bench result: Currently, the QuickJs
Splay
splay tree benchmark case are too slow exceptionally, this is more like an jerrscript issue--jerry-cmdline-snapshot=ON --jerry-math=ON --jerry-ext=ON
--amalgam=ON --snapshot-exec=ON --stack-limit=512 --gc-mark-limit=64
--cpointer-32bit=ON --system-allocator=ON --external-context=ON
--regexp-strict-mode=ON --js-parser=ON --line-info=ON --error-messages=ON
--logging=ON --cmake-param=-GNinja --cmake-param=-DJERRY_LCACHE=1
--cmake-param=-DJERRY_PROPRETY_HASHMAP=1 --profile=es.next
duktape.c duk_cmdline.c duk_console.c
-lm -lc -o duk