The performance of version 1.7.14 has gone backwards

SevenNight2012 commented 1 year ago

I am using rhino to calculate a set of float data

when using version 1.7.14, the calculation time is as follows:

and when using version 1.7.12,the calculation time is as follows:

rhino_performance

As far as I can see, the 1.7.12 version seems to perform better, is there something wrong with the 1.7.14 version? Or is there any special attention to the use of version 1.7.14?

My equipment and environment: Rhino used in android device：Google pixel 3XL Mac book Pro 2022，M2 CPU，16GB memory，OpenJDK 8

SevenNight2012 commented 1 year ago

Oh well, I found out that 1.7.13 works the same as 1.7.12, but it doesn't work as well since 1.7.14-RC1,all the code in the project has not changed, I just replaced the Rhino-runtime dependency, like this

rootProject.subprojects.forEach {
    it.configurations.all {
        resolutionStrategy {
            force "org.mozilla:rhino-runtime:1.7.14-RC1"
            // force "org.mozilla:rhino-runtime:1.7.13"
        }
    }
}

Does anyone have a similar problem? Does the author have any good suggestions? @p-bakker

tuchida commented 1 year ago

If you're running it on Android, do you have the optimizationLevel set to -1? Perhaps BigInt implementation may have something to do with it. ref. #837

rbri commented 1 year ago

Any chance to share the whole code of the calc() function to give us a chance to profile?

SevenNight2012 commented 1 year ago

If you're running it on Android, do you have the optimizationLevel set to -1? Perhaps BigInt implementation may have something to do with it. ref. #837

Yes,I have set it like this：

        Context jsContext = new RhinoAndroidHelper(ContextHolder.getContext()).enterContext();
        // Context jsContext = Context.enter();
        jsContext.setOptimizationLevel(-1);
        jsContext.setLanguageVersion(Context.VERSION_ES6);
        jsContext.setLocale(Locale.getDefault());
        jsContext.setWrapFactory(new WrapFactory());
        ImporterTopLevel scope = new ImporterTopLevel();
        scope.initStandardObjects(Context.getCurrentContext(), false);

SevenNight2012 commented 1 year ago

Any chance to share the whole code of the calc() function to give us a chance to profile?

No problem, but this calc method is very complicated internally, this method is used to calculate astronomical related data cal11Arc.js

SevenNight2012 commented 1 year ago

Any chance to share the whole code of the calc() function to give us a chance to profile?

This js file is very large, and the internal calculation is very complicated, maybe we only need a complex calculation to simulate it

tuchida commented 1 year ago

I tried to run your getXingdu on my PC. It is true that 1.7.14 seems to be a little slower, but it didn't seem to make much difference.

$ java -jar buildGradle/libs/rhino-1.7.14.jar -version 200 -opt -1 xingdu.js 
time-->  95
$ java -jar buildGradle/libs/rhino-1.7.13.jar -version 200 -opt -1 xingdu.js 
time-->  92
$ java -jar buildGradle/libs/rhino-1.7.12.jar -version 200 -opt -1 xingdu.js 
time-->  94

These values is just an example. In 1.7.14 it went from about 100 to 120, and in 1.7.12 and 1.7.13 it went from about 90 to 100. Perhaps it is different in the Android environment.

SevenNight2012 commented 1 year ago

I tried to run your getXingdu on my PC. It is true that 1.7.14 seems to be a little slower, but it didn't seem to make much difference.
$ java -jar buildGradle/libs/rhino-1.7.14.jar -version 200 -opt -1 xingdu.js 
time-->  95
$ java -jar buildGradle/libs/rhino-1.7.13.jar -version 200 -opt -1 xingdu.js 
time-->  92
$ java -jar buildGradle/libs/rhino-1.7.12.jar -version 200 -opt -1 xingdu.js 
time-->  94
These values is just an example. In 1.7.14 it went from about 100 to 120, and in 1.7.12 and 1.7.13 it went from about 90 to 100. Perhaps it is different in the Android environment.

Well, thank you very much, but, in my opinion, PC devices are more powerful than mobile phones, so the difference is not obvious.

Here are some of my guesses:

When using version 1.7.13,it also takes a long time to start, but then runs very fast
When using version 1.7.14,its performance is basically not improved
Is there any caching in version 1.7.13 that causes it to perform well in later runs?
Is there any difference between the use of version 1.7.13 and version 1.7.14, resulting in inconsistent performance of the engine?

gbrail commented 1 year ago

I recall having to add some synchronization around a global DateFormat object, since DateFormat is not actually thread-safe. It'd be interesting to look at the NativeDate class and see if any of that synchronization is contributing to bad performance, and if we could address it by not trying to have a global object.

SevenNight2012 commented 1 year ago

function b(e, n, a) {
  var t,
    o,
    r,
    f,
    i,
    s = n / 36525,
    c = q(s), //在月球轨道上经纬度？
    u = c[0],
    g = c[1],
    h = A(s) + g,
    l = "";
  if (e < 10) {
    o = ie(0, s, -1, -1, -1);
    t = ie(e, s, -1, -1, -1);
    t[0] = N(t[0]);
    t = J(t, o);
    i = t[2];
    s -= i * P;
    f = ie(0, s, -1, -1, -1);
    r = ie(e, s, -1, -1, -1);
    t = J(r, o);
    t = J(r, f);
    t[0] = N(t[0] + u);
    if (0 == a) {
      t = Y(t, h);
      l = L(t[0]);
    }
    if (a == 1) {
      l = L(t[0]);
    }
  }
  if (e == 10) {
    t = ce(s, 1, 1, -1);
    i = t[2];
    s -= (i * P) / C;
    t = ce(s, -1, -1, -1);
    t[0] = N(t[0] + u);
    if (a == 0) {
      t = Y(t, h);
      l = L(t[0]);
    }
    if (a == 1) {
      l = L(t[0]);
    }
  }

  return l;
}

Well, these days, I add time-consuming statistics before and after each line of code in getXingdu，and i seem to have found something. Most of the time is spent on this method, but in the v1.7.13 version, it will be very fast after multiple calls. Does anyone know why? @tuchida @gbrail @rbri Or can we check the difference between version 1.7.14 and version 1.7.13?

tuchida commented 1 year ago

@SevenNight2012 The getXingdu source code you posted earlier is too large for my environment to view this issue in my browser. Can you post the source code on gist and edit your comment to include that link?

SevenNight2012 commented 1 year ago

@SevenNight2012 The getXingdu source code you posted earlier is too large for my environment to view this issue in my browser. Can you post the source code on gist and edit your comment to include that link?

Ok, this file is really big, here is the gist address cal11Arc.js

tuchida commented 1 year ago

Thanks. Can the code for this comment be deleted? https://github.com/mozilla/rhino/issues/1365#issuecomment-1671026939

SevenNight2012 commented 1 year ago

@SevenNight2012 The getXingdu source code you posted earlier is too large for my environment to view this issue in my browser. Can you post the source code on gist and edit your comment to include that link?

I need to explain the general function of this file, which is to calculate the position of the sun, moon and other stars on the earth's ecliptic according to time~ There's a lot of computation involved inside it, so this might be extreme, but it should be exciting to optimize it！

SevenNight2012 commented 1 year ago

Thanks. Can the code for this comment be deleted? #1365 (comment)

Okay，please wait~

SevenNight2012 commented 1 year ago

Thanks. Can the code for this comment be deleted? #1365 (comment)

Look at it now！I've dealt with that comment

tuchida commented 1 year ago

The only cause I can think of so far is the implementation of BigInt (ref. #837). However, when I implemented this, I measured micro-benchmarks and even a small change in the code changed the results significantly. Quantitative measurement is difficult because of the possible influence of the optimization mechanism of the JVM. If you continue to investigate, for example, see if performance changes before and after #837 is merged, that might tell you something.

SevenNight2012 commented 1 year ago

The only cause I can think of so far is the implementation of BigInt (ref. #837). However, when I implemented this, I measured micro-benchmarks and even a small change in the code changed the results significantly. Quantitative measurement is difficult because of the possible influence of the optimization mechanism of the JVM. If you continue to investigate, for example, see if performance changes before and after #837 is merged, that might tell you something.

Thank you very much, I think the BigInt you mentioned is indeed possible, because in fact, I also found that the doArithmetic method of the Interpreter class is obviously time-consuming in version 1.7.14, but this does not seem to be the root cause I'll try to keep checking~

SevenNight2012 commented 1 year ago

Hi, everyone, I have located this problem, it started from the commit id 67187cba Can someone take a look at the issues in this commit? @tuchida @gbrail @rbri @p-bakker

tuchida commented 1 year ago

I disassembled Interpreter.class in 1.7.13 and 1.7.14 jar files.

$ javap -v -p ./Interpreter.class > ./Interpreter.dump

https://github.com/mozilla/rhino/blob/4c95547d4db8ac8d3e2528ddd5c41bc8d9bd1765/src/org/mozilla/javascript/Interpreter.java#L1207

The switch statements changed in the commit you found were both tableswitch.

# 1.7.13
       254: tableswitch   { // -66 to 157

# 1.7.14
       266: tableswitch   { // -74 to 160

I am not very familiar with JVM. Based on that I have measured that tableswitch seems to change the performance of certain cases depending on the number and order of the cases. It may be possible to speed up a particular case in some way. But I do hope the JVM optimization will improve.

In Dalvik, tableswitch seems to be equivalent to packed-switch. If you read the bytecode that Dalvik is running, you might learn something.

p-bakker commented 1 year ago

@SevenNight2012 the commit you referred to adds support for template literals.

Before the commit any strings delimited by backticks (`) were processed as non-template, regular strings.

So if you're code uses backtick-delimited strings, as of the referred commit those strings will now be processed as template strings, thus requiring more processing, thus causing a difference in performance.

Nothing else in the commit stand out to me as a potential cause of performance degradation when NOT using backtick-delimited strings (but I could be wrong)

SevenNight2012 commented 1 year ago

@SevenNight2012 the commit you referred to adds support for template literals.

Before the commit any strings delimited by backticks (`) were processed as non-template, regular strings.

So if you're code uses backtick-delimited strings, as of the referred commit those strings will now be processed as template strings, thus requiring more processing, thus causing a difference in performance.

Nothing else in the commit stand out to me as a potential cause of performance degradation when NOT using backtick-delimited strings (but I could be wrong)

Well, I built a lot of jars to find the problem, maybe there is really something in this commit~ I'll try to look at that commit, but I don't know much about rhino internals, would you like to take a look together?

gbrail commented 1 year ago

Can you help us understand how you're measuring the performance of this?

I built a little tool that uses JMH to run a script (or call a function) and combined with "git bisect" that tool makes it pretty easy to detect a performance regression. (I'll try to publish it soon.) However, I ran your script with this tool on 1.7.13 and 1.7.14 and the current master branch and performance was essentially the same. This is on a Linux box in both interpreted and compiled mode.

So, is there a particular function (and parameters) that we can call using your script so that we can test for ourselves?

On Sat, Aug 19, 2023 at 8:41 AM 徐星晨 @.***> wrote:

@SevenNight2012 https://github.com/SevenNight2012 the commit you referred to adds support for template literals.

Before the commit any strings delimited by backticks (`) were processed as non-template, regular strings.

So if you're code uses backtick-delimited strings, as of the referred commit those strings will now be processed as template strings, thus requiring more processing, thus causing a difference in performance.

Nothing else in the commit stand out to me as a potential cause of performance degradation when NOT using backtick-delimited strings (but I could be wrong)

[image: image] https://user-images.githubusercontent.com/10094410/261806231-e4892c64-9d70-411d-8bbc-4483a85b5b3b.png

Well, I built a lot of jars to find the problem, maybe there is really something in this commit~ I'll try to look at that commit, but I don't know much about rhino internals, would you like to take a look together?

— Reply to this email directly, view it on GitHub https://github.com/mozilla/rhino/issues/1365#issuecomment-1685030917, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD7I27BZHPTV7B27VPQKH3XWDNEHANCNFSM6AAAAAA3H7YHFA . You are receiving this because you were mentioned.Message ID: @.***>

SevenNight2012 commented 1 year ago

Can you help us understand how you're measuring the performance of this? I built a little tool that uses JMH to run a script (or call a function) and combined with "git bisect" that tool makes it pretty easy to detect a performance regression. (I'll try to publish it soon.) However, I ran your script with this tool on 1.7.13 and 1.7.14 and the current master branch and performance was essentially the same. This is on a Linux box in both interpreted and compiled mode. So, is there a particular function (and parameters) that we can call using your script so that we can test for ourselves? … On Sat, Aug 19, 2023 at 8:41 AM 徐星晨 @.> wrote: @SevenNight2012 https://github.com/SevenNight2012 the commit you referred to adds support for template literals. Before the commit any strings delimited by backticks (`) were processed as non-template, regular strings. So if you're code uses backtick-delimited strings, as of the referred commit those strings will now be processed as template strings, thus requiring more processing, thus causing a difference in performance. Nothing else in the commit stand out to me as a potential cause of performance degradation when NOT using backtick-delimited strings (but I could be wrong) [image: image] https://user-images.githubusercontent.com/10094410/261806231-e4892c64-9d70-411d-8bbc-4483a85b5b3b.png Well, I built a lot of jars to find the problem, maybe there is really something in this commit~ I'll try to look at that commit, but I don't know much about rhino internals, would you like to take a look together? — Reply to this email directly, view it on GitHub <#1365 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD7I27BZHPTV7B27VPQKH3XWDNEHANCNFSM6AAAAAA3H7YHFA . You are receiving this because you were mentioned.Message ID: @.>

Can you help us understand how you're measuring the performance of this? I built a little tool that uses JMH to run a script (or call a function) and combined with "git bisect" that tool makes it pretty easy to detect a performance regression. (I'll try to publish it soon.) However, I ran your script with this tool on 1.7.13 and 1.7.14 and the current master branch and performance was essentially the same. This is on a Linux box in both interpreted and compiled mode. So, is there a particular function (and parameters) that we can call using your script so that we can test for ourselves? … On Sat, Aug 19, 2023 at 8:41 AM 徐星晨 @.> wrote: @SevenNight2012 https://github.com/SevenNight2012 the commit you referred to adds support for template literals. Before the commit any strings delimited by backticks (`) were processed as non-template, regular strings. So if you're code uses backtick-delimited strings, as of the referred commit those strings will now be processed as template strings, thus requiring more processing, thus causing a difference in performance. Nothing else in the commit stand out to me as a potential cause of performance degradation when NOT using backtick-delimited strings (but I could be wrong) [image: image] https://user-images.githubusercontent.com/10094410/261806231-e4892c64-9d70-411d-8bbc-4483a85b5b3b.png Well, I built a lot of jars to find the problem, maybe there is really something in this commit~ I'll try to look at that commit, but I don't know much about rhino internals, would you like to take a look together? — Reply to this email directly, view it on GitHub <#1365 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD7I27BZHPTV7B27VPQKH3XWDNEHANCNFSM6AAAAAA3H7YHFA . You are receiving this because you were mentioned.Message ID: @.>

Well, actually, I'm running rhino on an Android device, and I'm measuring its performance based on how time-consuming the methods are. This may not be comprehensive, but it should be intuitive. I'll show you an Android demo later.

SevenNight2012 commented 1 year ago

Can you help us understand how you're measuring the performance of this? I built a little tool that uses JMH to run a script (or call a function) and combined with "git bisect" that tool makes it pretty easy to detect a performance regression. (I'll try to publish it soon.) However, I ran your script with this tool on 1.7.13 and 1.7.14 and the current master branch and performance was essentially the same. This is on a Linux box in both interpreted and compiled mode. So, is there a particular function (and parameters) that we can call using your script so that we can test for ourselves? … On Sat, Aug 19, 2023 at 8:41 AM 徐星晨 @.> wrote: @SevenNight2012 https://github.com/SevenNight2012 the commit you referred to adds support for template literals. Before the commit any strings delimited by backticks (`) were processed as non-template, regular strings. So if you're code uses backtick-delimited strings, as of the referred commit those strings will now be processed as template strings, thus requiring more processing, thus causing a difference in performance. Nothing else in the commit stand out to me as a potential cause of performance degradation when NOT using backtick-delimited strings (but I could be wrong) [image: image] https://user-images.githubusercontent.com/10094410/261806231-e4892c64-9d70-411d-8bbc-4483a85b5b3b.png Well, I built a lot of jars to find the problem, maybe there is really something in this commit~ I'll try to look at that commit, but I don't know much about rhino internals, would you like to take a look together? — Reply to this email directly, view it on GitHub <#1365 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAD7I27BZHPTV7B27VPQKH3XWDNEHANCNFSM6AAAAAA3H7YHFA . You are receiving this because you were mentioned.Message ID: @.>

Hi, I uploaded my local rhino demo to my github, you can take a look, this is an Android demo, maybe you need to download AndroidStudio RhinoDemo

821938089 commented 11 months ago

This is because the JIT compilation of the interpretLoop method fails, resulting in performance degradation.

21:52:58.834 c.android.rhino          I  Compiling method java.lang.Object org.mozilla.javascript.Interpreter.interpretLoop(org.mozilla.javascript.Context, org.mozilla.javascript.Interpreter$CallFrame, java.lang.Object) kind=Baseline
21:52:58.834 c.android.rhino          I  Compilation of java.lang.Object org.mozilla.javascript.Interpreter.interpretLoop(org.mozilla.javascript.Context, org.mozilla.javascript.Interpreter$CallFrame, java.lang.Object) took 468us
21:52:58.834 c.android.rhino          I  Failed to compile method java.lang.Object org.mozilla.javascript.Interpreter.interpretLoop(org.mozilla.javascript.Context, org.mozilla.javascript.Interpreter$CallFrame, java.lang.Object) kind=Baseline

SevenNight2012 commented 8 months ago

This is because the JIT compilation of the interpretLoop method fails, resulting in performance degradation.

21:52:58.834 c.android.rhino          I  Compiling method java.lang.Object org.mozilla.javascript.Interpreter.interpretLoop(org.mozilla.javascript.Context, org.mozilla.javascript.Interpreter$CallFrame, java.lang.Object) kind=Baseline
21:52:58.834 c.android.rhino          I  Compilation of java.lang.Object org.mozilla.javascript.Interpreter.interpretLoop(org.mozilla.javascript.Context, org.mozilla.javascript.Interpreter$CallFrame, java.lang.Object) took 468us
21:52:58.834 c.android.rhino          I  Failed to compile method java.lang.Object org.mozilla.javascript.Interpreter.interpretLoop(org.mozilla.javascript.Context, org.mozilla.javascript.Interpreter$CallFrame, java.lang.Object) kind=Baseline

Wow，thank you very much~

SevenNight2012 commented 8 months ago

@gbrail @p-bakker Hey, I think I found the problem. Firstly,i mentioned before that there is a significant performance difference before and after 67187cba submission (based on Android platform),and combined with the tips he(@821938089 ) gave,i commented out the code about Icode_QUASI_CALLSITE, which is located in the interpretLoop method of Interpreter,in the end, rhino performed well! Here is my code:

Secondly,i try to imitate what RegExp does,below is my code

Based on what I did in point 2, the performance is good and compatible with the original code.

In addition,what I did above is only to enable rhino to execute JIT on the Android platform. I am not sure whether it can also improve performance on other platforms.

Lastly,as we can see in the second image, compiler allocated 7392KB to compile the interpretLoop method. The interpretLoop method is too large. Maybe we should consider optimizing this method. Thank you for your attention to this problem. I hope rhino will get better and better~~

p-bakker commented 4 months ago

Sry for not responding to your findings sooner

Have a bit of a hard time what your findings are. Is this summary correct?

adding a QuasiCallSiteProxy to the code similarly to how Rhino does that for RegExp improves performance?
the interpretLoop method is large

If that is correct, would you at least be able to create a PR for the QuasiCallSiteProxy change?

Not sure what can be done about the interpretLoop method though

p-bakker commented 4 months ago

Btw: I would think that the QuasiCallsite code would only get hit if you're code uses backticks, which, prior to 1.7.14 were just (incorrectly) considered regular string delimiters, but are now (properly) interpreted as Template String Literals

As for the too long InterpretLoop method: seems like the resolution would be to extract some logic from the method into helper functions. Haven't looked at potential candidates, but a PR is always welcome

p-bakker commented 3 months ago

@rbri could you comment on the getQuasiCallSiteProxy suggestion/approach? I saw you did stuff related to the getRegexProxy fairly recently, so am hoping you might know the purpose of these 'proxies' and whether it makes sense to add one for quasi callsites as well?

rbri commented 2 months ago

@rbri could you comment on the getQuasiCallSiteProxy suggestion/approach?

Note: this quasi stuff was renamed into template in commit f797ac85

Regarding the performance - i have no idea why commenting out that code will improve the performance; at least i fear some template constructs are no longer working after that change.

And now some words about the regex proxy thingy....

The overall idea is to make the whole RegExp implementation replacable - but this is not so simple like for other classes. The problem is the /regex/ syntax - this is handled by the parser. Therefore we have this strange factory-like pattern that supports

a more or less js independent regexp implementation that works as backend
enough support for the parser to be able to create js objects that are supported by the backend
some support for the replacement of the whole js part of the regexp support.

If you like to have a sample - HtmlUnit replaces only the backend (by converting the regex into a java regex) but still uses the js regex part from rhino.

And if someday someone merges #1419 i can try to implement a whole new regexp implementation based on a more advanced regexp engine (unicode flag and all the other missing stuff) that replaced the backend and the js part.

But i have absolutely no glue why the same approach should work in this case.

rbri commented 2 months ago

And one more idea: maybe we can improve the template handling in a way that we introduce a separate ast node if a template is a simple (multiline) template (no replacements inside). In this case we can simplify the processing to the same as we do for strings, only the decompiler has to use the ` instead of the ". My guess is more and more js stuff uses this as easy way to define strings with or without line breaks and we can simplify our processing for that.

@gbrail @p-bakker @tuchida @tonygermano - what do you think?

SevenNight2012 commented 2 months ago

Btw: I would think that the QuasiCallsite code would only get hit if you're code uses backticks, which, prior to 1.7.14 were just (incorrectly) considered regular string delimiters, but are now (properly) interpreted as Template String Literals

As for the too long InterpretLoop method: seems like the resolution would be to extract some logic from the method into helper functions. Haven't looked at potential candidates, but a PR is always welcome

Hi, here are my guesses First of all, I am using rhino on the Android platform, so performance-related issues will be amplified. Secondly, you can see from the screenshot that the system keeps prompting JIT to allocate memory. I extracted the code of Icode_QUASI_CALLSITE. I actually wanted to isolate it from the interpretLoop function, so that the virtual machine can optimize the code as much as possible when running. Facts have proved that, This seems to work Finally, it is precisely because of the second point that I recommend optimizing interpretLoop I'm not familiar with Rhino, so I can only locate the problem, and I don't seem to be sure what's causing it. Thank you for your continued attention to this issue @p-bakker @rbri

rbri commented 2 months ago

Will have another look at that after my vacation...

mozilla / rhino

The performance of version 1.7.14 has gone backwards #1365

I am using rhino to calculate a set of float data