svaarala / duktape

Duktape - embeddable Javascript engine with a focus on portability and compact footprint
MIT License
5.96k stars 516 forks source link

Recommended low-memory options #830

Closed zherczeg closed 8 years ago

zherczeg commented 8 years ago

Duktape has large set of low-memory options so figuring out an optimal set is difficult. Could you recommend me a flag set which satisfies the following conditions:

Thank you very much.

zherczeg commented 8 years ago

Any help would be appreciated. I thought Duktape works on low-end sytems, or that is not a target anymore?

saghul commented 8 years ago

Have you gone through https://github.com/svaarala/duktape/blob/master/doc/low-memory.rst ? You'll need to run your own benchmarks, but that documentation is pretty complete.

svaarala commented 8 years ago

@zherczeg I've missed this issue for some reason. Duktape does target low memory systems and @saghul pointed to the currently best documentation going through the various options for that.

Some quick answers below:

An ES 5.1 compatible engine, which has 100% pass rate on es5-tests branch of test262 test suite excluding the internalization tests (i.e. all unnecessary features are disabled to reduce binary size).

Duktape doesn't currently pass all the test262:es5-tests; there are about 12 (?) failing tests whose issues boil down to (see doc/release-notes-xxx.rst for details):

Enabling low-memory options. Although test262 conformance tests and SunSpider benchmark suite should still work. The low-memory options should have an acceptable trade-of, e.g. 10% memory save for 10% perf loss is ok, but 100% perf loss for that memory gain is too much.

I'm not sure I understand why the relative performance loss is a useful criteria: different targets have different absolute performance requirements and an acceptable trade-off would depend on how close to that requirement a baseline configuration is.

In any case, you should be able to use most of the low memory suggestions in the document @saghul pointed to above. Pointer compression may affect performance more than the 10% you indicated - it depends on how the compression macros are defined; the macros are provided by the application code so they may make a function call or handle the pointer packing/unpacking inline. Dropping the hash part of the internal object representation (which is normally used for relatively large objects) may affect performance more than 10% for code dealing heavily with large objects (though low memory targets rarely have very large objects for obvious reasons).

The engine should be optimized for 32 bit systems, 64 bit support is not needed at all (might affect choosing the options).

There aren't any 32-bit vs. 64-bit target optimization options as such, but 32-bit systems should use "packed" duk_tval so that tagged values are represented by 8 bytes instead of 16. The duk_config.h config header tries to do that automatically based on architecture etc, but if that doesn't work you may need to force it manually.

zherczeg commented 8 years ago

Duktape doesn't currently pass all the test262:es5-tests; there are about 12 (?) failing tests whose issues boil down to (see doc/release-notes-xxx.rst for details):

That is not a problem. I just want to exclude non ES 5.1 features.

The documentation provided is about fine tuning the engine for those who already have deep knowledge about the internals. But this topic is about the evaluation of the engine for a newcomer. When people download the latest tarball, they don't want to do fine tuning, just setting a mode (e.g. default, low-memory, high-performance, etc.) and see how the engine performs on industry standard benchmarks. Since an evaluator knows little about the engine, there is a high chance of mis-configuring it, and that affects the outcome when several engines compared (imagine when you have to figure out the best options in a few days for 3-5 engines). The authors of an engine usually know the best options for a given scenario. Although that could be further fine tuned by 5-10% on certain architectures, but it still shows the general performance of an engine in a given scenario. When differences are high, fine tuning will not likely help.

svaarala commented 8 years ago

Ah, I didn't realize you were specifically interested about general benchmarking rather than looking for options for a specific target.

There's no "benchmark configuration" at present, but the configuration example file config/examples/low_memory.yaml tries to answer that use case to some extent. It contains the most common low memory options which can be enabled without much thought. However, it's not really possible to provide a full out-of-the-box (very) low memory configuration (with Duktape at least) because low memory targets typically need carefully optimized allocation providers and pointer compression macros need to be defined to match that allocation provider and configuration. With Duktape the allocation provider is external to the engine to allow it to be adapted as flexibly as possible (e.g. to non-continuous memory pools, an existing system allocator, etc).

Typically the process with a specific target starts with defaults, and seeing how memory usage turns out on the target. If low memory optimization is needed, one can progressively enable low memory config options until some goal is reached. Each option comes with some trade-off: for example, disabling tracebacks has some usability implications but no compliance impact since they're not an ES5.1 feature. Because of these trade-offs, it's usually best to enable options one by one as needed - ideally one would need to give up a minimum of features to achieve the memory footprint for a certain target. (If no trade-offs were involved, there would only be a low memory configuration to begin with :-)

svaarala commented 8 years ago

Hmm, since there's a pool allocator example already in the repo it'd be a relatively small effort to make a benchmark configuration based on that. It would then allow benchmarking builds with pointer compressions and (separately) builds with pointer compression and ROM built-in objects (which is a relevant configuration for very low RAM targets).

Performance optimized configuration is in config/examples/performance_sensitive.yaml. It's much easier to use as is because there's no external dependency like the allocator. There's one trade-off at least which may not be obvious: whether to enable or disable refcounting. Some environments rely on having accurate and prompt GC (which refcounting provides) while in others a periodic or emergency mark-and-sweep suffices. Refcounting has a clear performance impact because it affects a lot of operations.

I'll open a separate issue for making a more clearly defined set of benchmark-friendly configurations and support files available in the distributable. #845.

zherczeg commented 8 years ago

Thank you for the help.