Embedded code performance

alturkovic commented 2 months ago

Hi! I am trying to use GraalVM from Kotlin to execute a Python script. I need to pass some input parameters, evaluate a script and read the result as a json.

I did a simple comparison with Jython:

package org.example.test

import com.fasterxml.jackson.databind.JsonNode
import com.fasterxml.jackson.module.kotlin.jacksonObjectMapper
import org.graalvm.polyglot.Context
import org.graalvm.polyglot.Engine
import org.graalvm.polyglot.Source
import org.python.util.PythonInterpreter
import kotlin.time.measureTime

fun main() {
    graalvm()
    jython()
}

val mapper = jacksonObjectMapper()

private fun graalvm() {
    val duration = measureTime {
        val engine = Engine.newBuilder()
            .option("engine.WarnInterpreterOnly", "false")
            .build()

        val script = Source.create("python", "{'name':'John', 'id': id}")

        repeat(100) {
            val ctx = Context.newBuilder().engine(engine).build()
            ctx.polyglotBindings.putMember("id", it)

            val result = ctx.eval(script)
            mapper.valueToTree<JsonNode>(result.`as`(Map::class.java))
        }
    }

    println("GraalVM: $duration")
}

private fun jython() {
    val duration = measureTime {
        val script = PythonInterpreter().compile("{'name':'John', 'id': id}")

        repeat(100) {
            PythonInterpreter().use { interpreter ->
                interpreter["id"] = it
                val pyResult = interpreter.eval(script)
                mapper.valueToTree<JsonNode>(pyResult)
            }
        }
    }

    println("Jython: $duration")
}

But the Jython implementation is ~4x faster for this example. I tried to compile and reuse the script with both implementations and just inject the necessary value.

I noticed I can reuse my val ctx = Context.newBuilder().engine(engine).build() by moving it outside the repeat block to make this example fast, but that is simply mutating the same context object in a loop and would retain old parameters in case they weren't reset after every execution and that seems fiddly.

Am I doing something wrong? Is there a better way to evaluate scripts from Java/Kotlin code?

After revisiting this issue, I added some simple logging to track the performance and it seems that most of the Jython time is spent on script parsing:

import com.fasterxml.jackson.databind.JsonNode
import com.fasterxml.jackson.module.kotlin.jacksonObjectMapper
import org.graalvm.polyglot.Context
import org.graalvm.polyglot.Engine
import org.graalvm.polyglot.Source
import org.python.util.PythonInterpreter
import kotlin.time.measureTime
import kotlin.time.measureTimedValue

fun main() {
    graalvm()
    jython()
}

val mapper = jacksonObjectMapper()

private fun graalvm() {
    val duration = measureTime {
        val engine = Engine.newBuilder()
            .option("engine.WarnInterpreterOnly", "false")
            .build()

        val builder = Context.newBuilder("python").engine(engine)

        val source = Source.create("python", "{'name':'John', 'id': id}")

        repeat(1000) {
            val executionTime = measureTime {
                builder.build().use { ctx ->
                    ctx.polyglotBindings.putMember("id", it)
                    val result = ctx.eval(source)
                    mapper.valueToTree<JsonNode>(result.`as`(Map::class.java))
                }
            }
            println(executionTime)
        }
    }
    println("GraalVM: $duration")
}

private fun jython() {
    val duration = measureTime {
        val (script, duration) = measureTimedValue { PythonInterpreter().compile("{'name':'John', 'id': id}") }

        println("Init: $duration")

        repeat(1000) {
            val executionTime = measureTime {
                PythonInterpreter().use { interpreter ->
                    interpreter["id"] = it
                    val pyResult = interpreter.eval(script)
                    mapper.valueToTree<JsonNode>(pyResult)
                }
            }
            println(executionTime)
        }
    }

    println("Jython: $duration")
}

This simple example yields almost the same amount of time on Jython with 100 or 1000 records, but GraalVM implementation scaled very poorly. Average Jython execution time is around 10 microseconds after warmup, whereas GraalVM seems to be around 10 ms after warmup, so it is actually a 1000x difference after warmup?

I must be doing something wrong with the GraalVM implementation, but I cannot figure out what.

msimacek commented 2 months ago

Jython's interpreters are not the same concept as our contexts. Contexts provide full isolation, they don't share any language state, a module imported in one context is independent from module imported in another context. And recreating all the state and reimporting all the modules (python needs a to import a lot of stuff for the core to work even if you don't import anything yourself) has a cost. Jython's intepreters have different global namespace, but they are not isolated, they share modules and other stuff. Writing into a module in one interpreter is visible in other interpreters.

So having a single context is closer to what Jython is doing. Unfortunately, the context API doesn't have a way to say you just want a new namespace. The idiomatic way of running parametrized code with GraalPy is to create a function and execute it with parameters. In your example, that would be:

        val ctx = Context.newBuilder().option("engine.WarnInterpreterOnly", "false").build()
        val fn = ctx.eval("python", """
            def fn(id):
                return {'name':'alen', 'id': id}
            fn
        """.trimIndent())

        repeat(100) {
            val result = fn.execute(it)
            mapper.valueToTree<JsonNode>(result.`as`(Map::class.java))
        }

It's still a bit slower than Jython, because we just do more stuff in the first initialization, but it's much better than before. If you need more flexibility than you can get with functions (i.e. the code needs to change), you can wrap python exec/eval functions, like this:

        val ctx = Context.newBuilder().option("engine.WarnInterpreterOnly", "false").build()
        val evalFn = ctx.eval("python", """
            def eval_fn(code, namespace):
                return eval(code, namespace, namespace)
            eval_fn
        """.trimIndent())
        val createDict = ctx.eval("python", "dict")

        repeat(100) {
            val namespace = createDict.execute()
            namespace.putHashEntry("id", it)
            val result = evalFn.execute("{'name':'alen', 'id': id}", namespace)
            mapper.valueToTree<JsonNode>(result.`as`(Map::class.java))
        }

alturkovic commented 2 months ago

Great, that makes sense, thank you for the explanation and the code samples, that helps a lot!

timfel commented 1 month ago

Just for reference, a lot of the initial cost is classloading.

long t0 = System.currentTimeMillis();
var context = Context.create("python");
context.eval("python", "print('Hello from GraalPy!')");
System.err.println(System.currentTimeMillis() - t0);
context.close();
t0 = System.currentTimeMillis();
var jython = new PythonInterpreter();
jython.exec("print 'Hello from Jython!'");
System.err.println(System.currentTimeMillis() - t0);
jython.close();

This gives:

Hello from GraalPy!
2293
Hello from Jython!
1285

Subsequently, it's much faster, but GraalPy is slower because we create an isolated context, whereas Jython basically just creates new globals. If I duplicate the code above and run, I get, for the second runs:

Hello from GraalPy!
252
Hello from Jython!
1

oracle / graalpython

Embedded code performance #424