oracle / graaljs

A high-performance, ECMAScript compliant, and embeddable JavaScript runtime for Java
https://www.graalvm.org/javascript/
Universal Permissive License v1.0
1.8k stars 190 forks source link

Poor performance when having to "reset" JavaScript bindings scope #561

Open erstr2ggl opened 2 years ago

erstr2ggl commented 2 years ago

Hello,

we're still in the process of migrating our application from Rhino to GraalVM and are now at a place where we can compare our new GraalVM implementation with the old Rhino implementation. Sadly, performance is not as good as we had hoped.

The time to run one "action" has actually increased by a factor of around 2x compared to Rhino, which really isn't acceptable for our use case.

When profiling, the bottleneck quickly becomes apparent: With our current implementation, we're creating a Context once and are then, when we need to reset the bindings scope, setting it with this method:

public static void setJavascriptScope(Map<String, Object> scope) {
    Value bindings = context.getBindings("js");
    for (String member : bindings.getMemberKeys()) {
        if (scope.containsKey(member)) {
            bindings.putMember(member, scope.get(member));
            scope.remove(member);
        } else {
            bindings.removeMember(member);
        }
    }

    for (Map.Entry<String, Object> newMember : scope.entrySet()) {
        bindings.putMember(newMember.getKey(), newMember.getValue());
    }
}

The problem lies with the call to removeMember(), which takes up ~90% of the time within this method. In fact, simply setting the scope takes up ~70% of the time compared to ~30% of time that is spent actually evaluating our JavaScript code!

Now, the usual recommendation is to create a new Context instead of resetting the bindings scope like this, but we've found that this way is actually slightly faster and definitely easier to implement in our application.

Nonetheless, we've tried to create a new Context everytime we need to reset the bindings scope. Sadly, as we had already measured, performance isn't really faster that way. Instead of the bottleneck being the removeMember() method, now it's the creation of the new Context each time, meaning that performance is still ~2x worse.

For background, in our application, we're dealing with a lot (hundreds) of scope objects that need to be changed and set thousands (in some cases a few orders of magnitude more) of times during one "action". The actual JavaScript that is executed on the other hand is relatively basic and simple.

We fear that such a use case of GraalJS has (perhaps?) not yet been on your radar and that therefore, performance in this use case could be improved by optimizing the process of (re)setting the bindings scope.

wirthi commented 2 years ago

Hi,

as you already mention, the suggested way to do this is to create a new Context object every time. By using a shared Engine and reusing cached Source objects that case should be fast, faster than manually mingling with the Context object by removing entries.

https://www.graalvm.org/22.0/reference-manual/embed-languages/#code-caching-across-multiple-contexts

If you already follow all the advice there, can you maybe share a reproducer for your problem, exhibiting the bad performance?

Best, Christian

erstr2ggl commented 2 years ago

Hello,

creating a simple class that reproduces the problem to it's full extent is hard due to the complexity of our application, but here's a short one that I hope doesn't optimize anything crucial away:

public class Example {

    private long counter = 0;

    public void runGraal() {
        Engine engine = Engine.newBuilder("js").option("js.ecmascript-version", "6")
                .option("js.foreign-object-prototype", "true").build();
        for (int i = 0; i < 5000; ++i) {
            Context context = Context.newBuilder("js").engine(engine).build();
            Value bindings = context.getBindings("js");
            for (int j = 0; j < 250; ++j) {
                bindings.putMember("val" + counter, new SecureRandom().nextInt());
                ++counter;
            }
            System.out.println(context.eval("js", "val" + (counter - new SecureRandom().nextInt(250) - 1) + " + 0").toString());
    }

    public void runRhino() {
        for (int i = 0; i < 5000; ++i) {
            org.mozilla.javascript.Context context = org.mozilla.javascript.Context.enter();
            Scriptable scope = context.initStandardObjects();
            for (int j = 0; j < 250; ++j) {
                scope.put("val" + counter, scope, new SecureRandom().nextInt());
                ++counter;
            }
            System.out.println(context.evaluateString(scope, "val" + (counter - new SecureRandom().nextInt(250) - 1) + " + 0",
                    "js", 0, null).toString());
        }
    }

    public static void main(String[] args) {
        Example ex = new Example();

        // Graal (with warmup)
        ex.runGraal();
        long time = System.currentTimeMillis();
        ex.runGraal();
        double graal = (System.currentTimeMillis() - time) / 1000.;

        // Rhino
        time = System.currentTimeMillis();
        ex.runRhino();
        double rhino = (System.currentTimeMillis() - time) / 1000.;

        System.out.println("Total time Graal: " + graal + "s");
        System.out.println("Total time Rhino: " + rhino + "s");
    }

}

After a few runs, I get on average: Graal: ~27.5s Rhino: ~21.5s

Note that even in medium cases, we'd have to recreate the context about 200.000 times, not just 5.000, which obviously exacerbates the issue by a lot. Also note that this comparison isn't really fair to Rhino, since it actually spends by far the most time actually evaluating and compiling the JavaScript code - instead of creating the context. In our application, this isn't really an issue since we've implemented caching for that. That was actually one of the plus points we've found with Graal, that we could remove our Rhino caching logic since that's already built in.

wirthi commented 2 years ago

Hi @erstr2ggl

thanks for your reproducer. I can roughly confirm your numbers. I'd add one more round of warmup up Graal, as that seems to have some influence, but in general, the numbers seem correct.

The performance of putting members is especially slow when doing that in the global scope. I see significantly better numbers - actually in the ballpark of Rhino - when doing the same not on many variables in the global scope, but as members of a variable:

public void runGraalNonGlobal() {
        Engine engine = Engine.create();
        int check=0;
        for (int i = 0; i < 5000; ++i) {
            Context context = Context.newBuilder("js").engine(engine).option("js.ecmascript-version", "6")
                .option("js.foreign-object-prototype", "true").build();
            Value globalBindings = context.getBindings("js");
            context.eval("js","var inner={}");
            Value innerBindings = globalBindings.getMember("inner");
            for (int j = 0; j < 250; ++j) {
                innerBindings.putMember("val" + counter, new SecureRandom().nextInt());
                ++counter;
            }
            String result = context.eval("js", "inner.val" + (counter - new SecureRandom().nextInt(250) - 1) + " + 0").toString();
            check += result.length();
        }
        System.out.println("Check: "+check);
    }

Another, even faster option is to only share one entry from Java to JavaScript, and modify everything else on the Java side. Very similar to that approach above, but having inner as a Java class that has the respective members. I am sharing an int[250] here (note this requires allowHostAccess(HostAccess.ALL)), but if you need more flexibility, you could look into org.graalvm.polyglot.proxy.ProxyArray or ...ProxyObject.

    public void runGraalHostAccess() {
        Engine engine = Engine.create();
        int result=0;
        for (int i = 0; i < N; ++i) {
            Context context = Context.newBuilder("js").engine(engine).option("js.ecmascript-version", "6").option("js.foreign-object-prototype", "true").allowHostAccess(HostAccess.ALL).build();
            Value bindings = context.getBindings("js");
            int[] arr = new int[VARIABLE_COUNT];
            bindings.putMember("array", arr);
            for (int j = 0; j < VARIABLE_COUNT; ++j) {
                arr[j] = new SecureRandom().nextInt(250);
            }
            String code = "array[" + (new SecureRandom().nextInt(250)) + "]";
            int x = context.eval("js", code).asInt();
            result += x;
        }
        System.out.println(result);
    }

On my machine, this variant is by far the fastest, roughly 40% faster than your initial Rhino example (I know Rhino too little, you might be able to do something similar there as well).

Best, Christian

erstr2ggl commented 2 years ago

Hello,

thanks for your quick reply and for looking for a solution.

Sadly, creating a single entry in which we put our values isn't really a feasable solution for us, since that would mean that we'd have to adapt all of our JavaScript snippets to access the values via inner.xyz. We're talking millions of small snippets spread across many different systems.

Is there perhaps a way to improve the performance of putting members on the global scope, so that it's similarly fast?

erstr2ggl commented 2 years ago

Hello,

we've actually found a solution to access the values of the inner object by surrounding our JS snippets with with (inner) { ... }.

While performance is definitely much better, it is still not quite as fast as our old Rhino implementation.

We still believe that it's a problem GraalJS has with many elements in the scope, as per this example:

public void runGraal() {
    Map<String, Object> scope = new HashMap<>();
    for (int i = 0; i < 1000; ++i) {
        scope.put("val" + i, i);
    }
    Scope s = new Scope(scope);

    try (Context ctx = Context.newBuilder("js").build()) {
        ctx.getBindings("js").putMember("inner", s);
        long time = System.currentTimeMillis();
        for (int i = 0; i < 5000; ++i) {
            System.out.println("out: " + ctx.eval("js", "with (inner) { Math.PI * val1 }").toString());
        }
        System.out.println("Time taken a: " + (System.currentTimeMillis() - time) / 1000. + "s");
    }
}

public void runRhino() {
    org.mozilla.javascript.Context ctx = org.mozilla.javascript.Context.enter();
    Scriptable scope = ctx.initStandardObjects();
    for (int i = 0; i < 1000; ++i) {
        scope.put("val" + i, scope, i);
    }

    long time = System.currentTimeMillis();
    Script exec = ctx.compileString("Math.PI * val1", "js", 0, null);
    for (int j = 0; j < 5000; ++j) {
        System.out.println("out: " + exec.exec(ctx, scope).toString());
    }
    System.out.println("Time taken: " + (System.currentTimeMillis() - time) / 1000. + "s");
    ctx.close();
}

Scope in this case is a simple class that implements ProxyObject and simply returns the values from the Map.

Results with two warmup rounds for GraalJS: Graal: ~0,35s Rhino: ~0.16s

wirthi commented 2 years ago

Hi,

I can confirm we are around 2x slower on that example. I don't think there is a whole lot we can do about that.

For background, in our application, we're dealing with a lot (hundreds) of scope objects that need to be changed and set thousands (in some cases a few orders of magnitude more) of times during one "action". The actual JavaScript that is executed on the other hand is relatively basic and simple.

Right, this might just not be a good usecase for our approach. We try to deliver good enough performance, and a 2x drop on that specific microbenchmark is obviously relevant for you, but also far from catastrophic - for something we consider a corner-case.

As stated above, we support more languages and thus have a certain impact on the boundary. The GraalVM compiler is able to mitigate most of that in the real-world application we have seen so far. An application/benchmark that almost exclusively exercises that boundary might see some impact compared to other approaches that don't need any logic on the boundary.

We fear that such a use case of GraalJS has (perhaps?) not yet been on your radar and that therefore, performance in this use case could be improved by optimizing the process of (re)setting the bindings scope.

Our way of doing this is to create a new Context. We are not convinced that providing a feature to reset the binding scope on a Context is reasonable (given we consider this a corner-case), or even technically feasible in general without severe implications on many of the assumptions and invariants we currently have on a Context and Value lifecycles.

Best, Christian

in-fke commented 3 months ago

I did some Sampling with JVisualVM and what strikes me is that PolyglotMap.put (via GraalJSBindings.put) is invoking get (needs to, obviously, since it fullfills the Map contract), but get seems much more expensive than put? (see screenshot below). Ironically, we don't need to know the return value of get to initialize a binding.

So my questions:

  1. why does get seem so much more expensive than put?
  2. Maybe there could be some "fast path" with "put" avoiding "get"?
  3. Is there a way to avoid GraalJSBindings.put? What I am basically trying to do is "declare" all sorts of Spring Beans in the JS Context, basically like bindings.put(<spring-bean-name>, <spring-bean-instance>).

image

get is doing all sorts of stuff

image

caoccao commented 3 months ago

I've done some tests based on the test code with little modification, and would like to share my findings for your reference.

Prerequisites and Assumptions

  1. SecureRandom() is quite time consuming. I'm replacing it with fixed sequences.
  2. As the GraalJS test result is already confirmed, I'm not going to test GraalJS, but to add a V8 based library Javet.
  3. Rhino Scriptable holds either a map or an array for storing the members. That makes the put() method as fast as a hash map put(). When the Scriptable is being evaluated in Rhino, it will take a considerable amount of time to substantially inject the Scriptable in the execution engine. So, the tests must include the script evaluation so that it will be fair to all libraries.
  4. Javet supports batch set() that can improve performance significantly. So, the tests include both batch and non-batch set().
  5. There is a significant difference between evaluating only 1 variable and evaluating all variables in terms of performance. So, the tests include both scenarios.
  6. Javet provides resetContext() that allows clearing the whole context. So, there is no need to repeatedly create V8 runtimes.

Test Code

import com.caoccao.javet.exceptions.JavetException;
import com.caoccao.javet.interop.V8Host;
import com.caoccao.javet.interop.V8Runtime;
import com.caoccao.javet.values.reference.V8ValueGlobalObject;
import org.mozilla.javascript.Scriptable;

public class TestRhino {
    private static final String CODE = "var sum = 0;" +
            "for (var i = 0; i < 250; ++i) {\n" +
            "  sum += globalThis[`val${i}`];\n" +
            "}\n" +
            "sum;";

    public static void main(String[] args) {
        TestRhino test = new TestRhino();

        {
            test.initJavet();
        }

        {
            // Rhino
            long startTime = System.currentTimeMillis();
            test.runRhino();
            double deltaTime = (System.currentTimeMillis() - startTime) / 1000.;
            System.out.println("Total time Rhino: " + deltaTime + "s");
            System.gc();
            System.gc();
        }

        {
            // Rhino Scan
            long startTime = System.currentTimeMillis();
            test.runRhinoScan();
            double deltaTime = (System.currentTimeMillis() - startTime) / 1000.;
            System.out.println("Total time Rhino Scan: " + deltaTime + "s");
            System.gc();
            System.gc();
        }

        {
            // Javet
            long startTime = System.currentTimeMillis();
            test.runJavet();
            double deltaTime = (System.currentTimeMillis() - startTime) / 1000.;
            System.out.println("Total time Javet: " + deltaTime + "s");
            System.gc();
            System.gc();
        }

        {
            // Javet Scan
            long startTime = System.currentTimeMillis();
            test.runJavetScan();
            double deltaTime = (System.currentTimeMillis() - startTime) / 1000.;
            System.out.println("Total time Javet Scan: " + deltaTime + "s");
            System.gc();
            System.gc();
        }

        {
            // Javet Batch
            long startTime = System.currentTimeMillis();
            test.runJavetBatch();
            double deltaTime = (System.currentTimeMillis() - startTime) / 1000.;
            System.out.println("Total time Javet Batch: " + deltaTime + "s");
            System.gc();
            System.gc();
        }

        {
            // Javet Batch Scan
            long startTime = System.currentTimeMillis();
            test.runJavetBatchScan();
            double deltaTime = (System.currentTimeMillis() - startTime) / 1000.;
            System.out.println("Total time Javet Batch Scan: " + deltaTime + "s");
            System.gc();
            System.gc();
        }
    }

    public void initJavet() {
        try (V8Runtime v8Runtime = V8Host.getV8Instance().createV8Runtime()) {
            v8Runtime.getExecutor("1+1").executeVoid();
        } catch (JavetException e) {
            System.err.println(e.getMessage());
        }
    }

    public void runJavet() {
        int sum = 0;
        try (V8Runtime v8Runtime = V8Host.getV8Instance().createV8Runtime()) {
            for (int i = 0; i < 5000; ++i) {
                V8ValueGlobalObject v8ValueGlobalObject = v8Runtime.getGlobalObject();
                for (int j = 0; j < 250; ++j) {
                    v8ValueGlobalObject.set("val" + j, j);
                }
                sum += v8ValueGlobalObject.getInteger("val1");
                v8Runtime.resetContext();
            }
        } catch (JavetException e) {
            System.err.println(e.getMessage());
        }
        System.out.println(sum);
    }

    public void runJavetScan() {
        int sum = 0;
        try (V8Runtime v8Runtime = V8Host.getV8Instance().createV8Runtime()) {
            for (int i = 0; i < 5000; ++i) {
                V8ValueGlobalObject v8ValueGlobalObject = v8Runtime.getGlobalObject();
                for (int j = 0; j < 250; ++j) {
                    v8ValueGlobalObject.set("val" + j, j);
                }
                sum += v8Runtime.getExecutor(CODE).executeInteger();
                v8Runtime.resetContext();
            }
        } catch (JavetException e) {
            System.err.println(e.getMessage());
        }
        System.out.println(sum);
    }

    public void runJavetBatch() {
        int sum = 0;
        try (V8Runtime v8Runtime = V8Host.getV8Instance().createV8Runtime()) {
            for (int i = 0; i < 5000; ++i) {
                V8ValueGlobalObject v8ValueGlobalObject = v8Runtime.getGlobalObject();
                Object[] objects = new Object[500];
                for (int j = 0; j < 250; ++j) {
                    objects[j << 1] = "val" + j;
                    objects[(j << 1) + 1] = j;
                }
                v8ValueGlobalObject.set(objects);
                sum += v8ValueGlobalObject.getInteger("val1");
                v8Runtime.resetContext();
            }
        } catch (JavetException e) {
            System.err.println(e.getMessage());
        }
        System.out.println(sum);
    }

    public void runJavetBatchScan() {
        int sum = 0;
        try (V8Runtime v8Runtime = V8Host.getV8Instance().createV8Runtime()) {
            for (int i = 0; i < 5000; ++i) {
                V8ValueGlobalObject v8ValueGlobalObject = v8Runtime.getGlobalObject();
                Object[] objects = new Object[500];
                for (int j = 0; j < 250; ++j) {
                    objects[j << 1] = "val" + j;
                    objects[(j << 1) + 1] = j;
                }
                v8ValueGlobalObject.set(objects);
                sum += v8Runtime.getExecutor(CODE).executeInteger();
                v8Runtime.resetContext();
            }
        } catch (JavetException e) {
            System.err.println(e.getMessage());
        }
        System.out.println(sum);
    }

    public void runRhino() {
        int sum = 0;
        for (int i = 0; i < 5000; ++i) {
            try (org.mozilla.javascript.Context context = org.mozilla.javascript.Context.enter()) {
                Scriptable scope = context.initStandardObjects();
                for (int j = 0; j < 250; ++j) {
                    scope.put("val" + j, scope, j);
                }
                sum += (int) context.evaluateString(scope, "val1", "js", 0, null);
            }
        }
        System.out.println(sum);
    }

    public void runRhinoScan() {
        double sum = 0;
        for (int i = 0; i < 5000; ++i) {
            try (org.mozilla.javascript.Context context = org.mozilla.javascript.Context.enter()) {
                Scriptable scope = context.initStandardObjects();
                for (int j = 0; j < 250; ++j) {
                    scope.put("val" + j, scope, j);
                }
                sum += (double) context.evaluateString(scope, CODE, "js", 0, null);
            }
        }
        System.out.println(sum);
    }
}

Test Results

I ran the tests twice and the results are as follows.

5000
Total time Rhino: 2.521s
1.55625E8
Total time Rhino Scan: 3.559s
5000
Total time Javet: 3.38s
155625000
Total time Javet Scan: 3.803s
5000
Total time Javet Batch: 2.526s
155625000
Total time Javet Batch Scan: 3.044s

5000
Total time Rhino: 2.37s
1.55625E8
Total time Rhino Scan: 3.341s
5000
Total time Javet: 3.329s
155625000
Total time Javet Scan: 3.667s
5000
Total time Javet Batch: 2.465s
155625000
Total time Javet Batch Scan: 2.931s

Conclusions

Only 1 Variable Evaluated

All Variables Evaluated

Bonus

in-fke commented 3 months ago

hopefully, my previous comment was not too shadowed by the last comment. @wirthi I'd appreciate if you could look into https://github.com/oracle/graaljs/issues/561#issuecomment-2228992813 which I find interesting