Rationalize use of optimization levels

gbrail commented 1 month ago

Rhino supports an optimization level in the Context which goes from -1 to 9. From inspecting the code, I have seen that we use this as follows:

-1: At level -1 (or lower) we disable classfile generation and run in interpreted mode. (This is the only way to run on Android.)
0: We force the optimization level to 0 when "generating debug" is set. (I didn't remember that and I'd like to understand more why we do this.)
1: At level 1 and higher we support larger sparse native arrays (200,000 elements versus 10,000)
1: At level 1 and higher we activate the "direct call" optimization, which tries to create direct function calls in the generated bytecode, rather than looking up a function in the global context and calling it via the Callable interface. This dramatically speeds up the common case when a script defines a number of top-level functions, and then calls them from elsewhere in the same script.
1: At level 1 and higher, when there are a string of math optimizations, we replace the boxed Number on the stack with a native Java double, and replace calls to the math operations in ScriptRuntime with native math instructions (like DADD, DMUL, etc). This speeds up math-heavy code, and can even speed up a simple "for" loop.

Proposal: Should we combine all optimization levels into one, so that we have only two modes -- interpreted and compiled?

I think that this is low-risk because the optimizations enabled at level 1 above have been in the codebase for a decade or more and seem pretty stable by now. The result would be less complexity to test, and all the tests would run 1/3 faster. Also, JavaScript engines and Java runtimes these days tend not to have a lot of choices of optimization level.

I can think of two reasons against this:

At some point we reduced the optimization level when "generating debug." We'll have to go back and figure out why we did that and what "generating debug" does any more (since we generate stack traces in all cases anyway).
We may add additional optimizations in the future, particularly around invokedynamic instructions. Should we make those active at a higher level as well?

Here is some data on the effects of the current optimizations, using the SunSpider benchmark suite. You'll see the differences -- sometimes dramatic -- between the levels.

Optimization level 9:

Benchmark                                                    Mode  Cnt      Score       Error  Units
SunSpiderBenchmark.AccessBinaryTreesState.accessBinaryTrees  avgt    5   4554.025 ┬▒  1372.394  us/op
SunSpiderBenchmark.AccessFannAccessNsieveState.accessNsieve  avgt    5   3757.903 ┬▒   675.956  us/op
SunSpiderBenchmark.AccessFannkuchState.accessFannkuch        avgt    5  12213.519 ┬▒  4381.343  us/op
SunSpiderBenchmark.AccessNBodyState.accessNBody              avgt    5   9934.804 ┬▒  3071.391  us/op
SunSpiderBenchmark.Bitops3BitState.bitops3BitBitsInByte      avgt    5   1771.524 ┬▒   400.976  us/op
SunSpiderBenchmark.BitopsAndState.bitopsBitwiseAnd           avgt    5  22359.370 ┬▒  5024.664  us/op
SunSpiderBenchmark.BitopsBitsState.bitopsBitsInByte          avgt    5   2606.073 ┬▒   658.725  us/op
SunSpiderBenchmark.BitopsNsieveState.bitopsNsieveBits        avgt    5   6065.162 ┬▒   477.335  us/op
SunSpiderBenchmark.CryptoAesState.cryptoAes                  avgt    5   7761.489 ┬▒   950.520  us/op
SunSpiderBenchmark.CryptoMd5State.cryptoMd5                  avgt    5   7247.374 ┬▒   651.803  us/op
SunSpiderBenchmark.CryptoShaState.cryptoSha1                 avgt    5   4631.069 ┬▒   851.477  us/op
SunSpiderBenchmark.DateFormatToFteState.dateFormatToFte      avgt    5  15790.867 ┬▒  2729.558  us/op
SunSpiderBenchmark.DateFormatXparbState.dateFormatXparb      avgt    5  22495.449 ┬▒  3673.012  us/op
SunSpiderBenchmark.MathCordicState.mathCordic                avgt    5   5549.954 ┬▒   601.631  us/op
SunSpiderBenchmark.MathPartialState.mathPartialSums          avgt    5  10115.134 ┬▒  1086.580  us/op
SunSpiderBenchmark.MathSpectralNormState.mathSpectralNorm    avgt    5   2302.158 ┬▒   345.532  us/op
SunSpiderBenchmark.RecursiveState.controlflowRecursive       avgt    5   2338.775 ┬▒   880.143  us/op
SunSpiderBenchmark.RegexpState.regexpDna                     avgt    5  69770.263 ┬▒ 15569.251  us/op
SunSpiderBenchmark.StringBase64State.stringBase64            avgt    5  20768.572 ┬▒  2262.376  us/op
SunSpiderBenchmark.StringFastaState.stringFasta              avgt    5  30470.120 ┬▒  1662.759  us/op
SunSpiderBenchmark.StringTagcloudState.stringTagcloud        avgt    5  31591.067 ┬▒  3968.882  us/op
SunSpiderBenchmark.StringUnpackState.stringUnpackCode        avgt    5  20930.256 ┬▒  4457.681  us/op
SunSpiderBenchmark.StringValidateState.stringValidateInput   avgt    5  18132.135 ┬▒  1988.114  us/op
SunSpiderBenchmark.ThreeDCubeState.threeDCube                avgt    5   8703.786 ┬▒  1703.704  us/op
SunSpiderBenchmark.ThreeDMorphState.threeDMorph              avgt    5   6932.278 ┬▒  1626.954  us/op
SunSpiderBenchmark.ThreeDRayState.threeDRayTrace             avgt    5   9840.743 ┬▒  1407.723  us/op

Optimization level 0:

Benchmark                                                    Mode  Cnt      Score      Error  Units
SunSpiderBenchmark.AccessBinaryTreesState.accessBinaryTrees  avgt    5   6273.223 ┬▒  583.949  us/op
SunSpiderBenchmark.AccessFannAccessNsieveState.accessNsieve  avgt    5  63190.040 ┬▒ 5978.810  us/op
SunSpiderBenchmark.AccessFannkuchState.accessFannkuch        avgt    5  24788.778 ┬▒ 4325.476  us/op
SunSpiderBenchmark.AccessNBodyState.accessNBody              avgt    5  13873.662 ┬▒ 3135.632  us/op
SunSpiderBenchmark.Bitops3BitState.bitops3BitBitsInByte      avgt    5   5407.475 ┬▒  875.163  us/op
SunSpiderBenchmark.BitopsAndState.bitopsBitwiseAnd           avgt    5  22189.012 ┬▒ 5359.900  us/op
SunSpiderBenchmark.BitopsBitsState.bitopsBitsInByte          avgt    5   5274.699 ┬▒  407.409  us/op
SunSpiderBenchmark.BitopsNsieveState.bitopsNsieveBits        avgt    5  14116.023 ┬▒ 1405.804  us/op
SunSpiderBenchmark.CryptoAesState.cryptoAes                  avgt    5  10519.604 ┬▒ 1529.472  us/op
SunSpiderBenchmark.CryptoMd5State.cryptoMd5                  avgt    5   7317.630 ┬▒ 1297.650  us/op
SunSpiderBenchmark.CryptoShaState.cryptoSha1                 avgt    5   6450.331 ┬▒  565.151  us/op
SunSpiderBenchmark.DateFormatToFteState.dateFormatToFte      avgt    5  16004.522 ┬▒ 2114.462  us/op
SunSpiderBenchmark.DateFormatXparbState.dateFormatXparb      avgt    5  22927.598 ┬▒ 2867.733  us/op
SunSpiderBenchmark.MathCordicState.mathCordic                avgt    5  13360.517 ┬▒ 2652.011  us/op
SunSpiderBenchmark.MathPartialState.mathPartialSums          avgt    5  13867.575 ┬▒ 1555.306  us/op
SunSpiderBenchmark.MathSpectralNormState.mathSpectralNorm    avgt    5   5806.774 ┬▒  526.283  us/op
SunSpiderBenchmark.RecursiveState.controlflowRecursive       avgt    5   3238.993 ┬▒  159.999  us/op
SunSpiderBenchmark.RegexpState.regexpDna                     avgt    5  70542.794 ┬▒ 7844.981  us/op
SunSpiderBenchmark.StringBase64State.stringBase64            avgt    5  20688.759 ┬▒ 1054.874  us/op
SunSpiderBenchmark.StringFastaState.stringFasta              avgt    5  31271.605 ┬▒ 4943.156  us/op
SunSpiderBenchmark.StringTagcloudState.stringTagcloud        avgt    5  33132.600 ┬▒ 4460.551  us/op
SunSpiderBenchmark.StringUnpackState.stringUnpackCode        avgt    5  21812.775 ┬▒ 3133.014  us/op
SunSpiderBenchmark.StringValidateState.stringValidateInput   avgt    5  18242.684 ┬▒ 4181.550  us/op
SunSpiderBenchmark.ThreeDCubeState.threeDCube                avgt    5   9995.056 ┬▒ 1680.234  us/op
SunSpiderBenchmark.ThreeDMorphState.threeDMorph              avgt    5  10058.731 ┬▒ 2076.572  us/op
SunSpiderBenchmark.ThreeDRayState.threeDRayTrace             avgt    5  11056.867 ┬▒ 1578.638  us/op

Optimization level 1 (interpreted mode):

Benchmark                                                    Mode  Cnt       Score       Error  Units
SunSpiderBenchmark.AccessBinaryTreesState.accessBinaryTrees  avgt    5   28002.415 ┬▒ 10426.879  us/op
SunSpiderBenchmark.AccessFannAccessNsieveState.accessNsieve  avgt    5  106819.276 ┬▒ 17435.953  us/op
SunSpiderBenchmark.AccessFannkuchState.accessFannkuch        avgt    5  214978.519 ┬▒ 45832.472  us/op
SunSpiderBenchmark.AccessNBodyState.accessNBody              avgt    5   86795.888 ┬▒ 19246.423  us/op
SunSpiderBenchmark.Bitops3BitState.bitops3BitBitsInByte      avgt    5   69063.885 ┬▒  6851.610  us/op
SunSpiderBenchmark.BitopsAndState.bitopsBitwiseAnd           avgt    5   62937.683 ┬▒ 12605.178  us/op
SunSpiderBenchmark.BitopsBitsState.bitopsBitsInByte          avgt    5  110455.062 ┬▒ 10706.981  us/op
SunSpiderBenchmark.BitopsNsieveState.bitopsNsieveBits        avgt    5  130495.479 ┬▒ 21714.674  us/op
SunSpiderBenchmark.CryptoAesState.cryptoAes                  avgt    5   43832.773 ┬▒  5553.989  us/op
SunSpiderBenchmark.CryptoMd5State.cryptoMd5                  avgt    5   42892.297 ┬▒  5889.241  us/op
SunSpiderBenchmark.CryptoShaState.cryptoSha1                 avgt    5   45868.328 ┬▒  4378.079  us/op
SunSpiderBenchmark.DateFormatToFteState.dateFormatToFte      avgt    5   43539.352 ┬▒  5572.470  us/op
SunSpiderBenchmark.DateFormatXparbState.dateFormatXparb      avgt    5   35030.230 ┬▒  5408.901  us/op
SunSpiderBenchmark.MathCordicState.mathCordic                avgt    5  137047.059 ┬▒  7376.909  us/op
SunSpiderBenchmark.MathPartialState.mathPartialSums          avgt    5   58491.862 ┬▒  4718.966  us/op
SunSpiderBenchmark.MathSpectralNormState.mathSpectralNorm    avgt    5   52700.036 ┬▒  5638.411  us/op
SunSpiderBenchmark.RecursiveState.controlflowRecursive       avgt    5   41072.069 ┬▒  4952.831  us/op
SunSpiderBenchmark.RegexpState.regexpDna                     avgt    5   64558.615 ┬▒  6625.523  us/op
SunSpiderBenchmark.StringBase64State.stringBase64            avgt    5   49990.384 ┬▒  6135.494  us/op
SunSpiderBenchmark.StringFastaState.stringFasta              avgt    5   75127.030 ┬▒  8977.165  us/op
SunSpiderBenchmark.StringTagcloudState.stringTagcloud        avgt    5   53353.614 ┬▒ 19365.019  us/op
SunSpiderBenchmark.StringUnpackState.stringUnpackCode        avgt    5   38324.908 ┬▒  4504.960  us/op
SunSpiderBenchmark.StringValidateState.stringValidateInput   avgt    5   43879.368 ┬▒  7555.575  us/op
SunSpiderBenchmark.ThreeDCubeState.threeDCube                avgt    5   77034.431 ┬▒ 10665.402  us/op
SunSpiderBenchmark.ThreeDMorphState.threeDMorph              avgt    5   86626.365 ┬▒  3224.384  us/op
SunSpiderBenchmark.ThreeDRayState.threeDRayTrace             avgt    5   78515.529 ┬▒ 13920.507  us/op

p-bakker commented 1 month ago

Sounds to me there's not much point in all the optimization levels, if we can figure out the reason for optLevel 0 and subsequently conclude we don't need it anymore

Going forward I think feature flagging an experimental optimization and eventually making it standard is better /simpler in the long run than having different optLevels users/embedders have to choose from

rPraml commented 1 month ago

Is there really often a case, when one level fail while an other passes?

What do you think, if the normal merge check only run in level 9 and maybe also only for one java version and a daily github action checks all optimization levels and java versions.

You can add build-badges to the readme.md, so that everyone can see, which builds are successful

p-bakker commented 1 month ago

@rPraml so you're not in favor of merging all optLevels >= 0 into a single 'compiled' mode?

rPraml commented 1 month ago

My previous post was actually about how to speed up the build pipeline. Currently, 3 optimization levels are tested against 3 platforms for each commit. Which takes about 45 minutes.

My question was more, is that really necessary? For most commits¹, wouldn't it be enough to test just one or two (eg. java 17 + compiled and maybe an other jvm in interpreted) of the 9 combinations (which would then only be 6 if we agreed on 2 compile levels). This could reduce build time down to < 10 minutes and saves github actions time (don't know if you have a plan where you need to pay for GH-actions)

A daily github action would then test all JVMs with compiled and interpreted level.

¹) For example: When I change something in JavaAdapter, the chance is very low, that I break only ONE optimization level, but commits like "Begin to use invokedynamic in the bytecode" may.

@rPraml so you're not in favor of merging all optLevels >= 0 into a single 'compiled' mode?

Sorry, I expressed myself in a confusing way. In my opinion, the API only needs to provide a way to switch between interpreted (-1) and compiled mode with all optimizations (9) as you suggest. I see no reason for an end-user, why crank up the level only to e.g. 0 or 4, when i also can also use 9. I would also say that 9 should be the default if the platform allows it. (autodetect?)

The only reason I can think of is that when fixing/enhancing low-level-code, like bytecode or IIR, the developer might want to turn off certain optimization steps to make debugging easier. (But a developer would probably change some flags or just comment out the relevant code temporarily.)

mozilla / rhino

Rationalize use of optimization levels #1658