Execution performance ideas

sbarzowski commented 6 years ago

First we need to have some benchmarking in place, and some reasonable "corpus" of jsonnet to test on. We can start by using current test suite, but dedicated benchmarks make more sense.

Once we have that, we can try to find bottlenecks and potentially optimize. Instead of having them scattered throughout the code, I'll dump some of these ideas here:

caching of object locals (they're desugared to locals in each field now)
caching of object fields (controversial and important, we need a discussion first)
memory usage of tail recursion
making sure that we don't keep objects for too long
generate unique numbers for variables statically and use that instead of names
faster implementation of std.join, uniq, setInter and other array building builtins
native implementation of std.sort, std.parseInt and other functions that the user may expect to be optimized
interning of booleans for faster comparisons (by using the pointer directly and never actually dereferencing) and lower memusage.
avoid desugaring operators, call builtins directly instead (fewer things to check, no need to do std lookup and verifying that it's actually a function etc.).
fast & easy way for users to build arrays- perhaps a more sophisticated representation of an array would help, so that + behaves reasonably and/or more native higher-order functions. This stuff often applies to strings, too.
faster indexing of objects with deep inheritance chains
reduce overhead of function call argument checking (currently we check for a lot of special cases, because of optional arguments and that requires building maps and multiple lookups for each argument - maybe it would help to skip these checks when there are only positional arguments)
check performance of environment capturing (and reuse the environment when possible)
interning of identifiers
try other representations of cachedThunk

Before we implement any of these we should have a benchmark that proves that it actually helps.

sbarzowski commented 6 years ago

Perhaps a lot of time is stored resolving library functions. Every time a function value is called, the same values are computed, which do not depend on arguments. If there is std.something in a tight loop, every time std needs to get resolved (which by itself may mean a lot of map lookups), then something needs to be looked up in the object and then a function value needs to be created. Creating a local outside of the function for std.something can be used as a workaround.

It could be helped by keeping things like std.something as potentialValues inside function values, instead of always evaluating them from ast.

We would need to be careful not to store too much data, if we kept everything that doesn't depend on arguments, the memory usage could get out of hand pretty easily.

sbarzowski commented 6 years ago

If we had resolved function values, we could go further and cache some call processing - the part which doesn't depend on actual argument values - checking if the number of arguments is correct, mapping positional/named arguments, etc. It could be a massive improvement for builtins.

redbaron commented 6 years ago

if there is std.something in a tight loop, every time std needs to get resolved

is it same for $ objects?

sbarzowski commented 6 years ago

Yep, $.something behaves similarly to std.something.

Note that by "resolved" I mean "looked up in the environment", just like any other variable.

sparkprime commented 6 years ago

It is a standard optimization to hoist invariant expressions out of functions, loops, etc.

sparkprime commented 6 years ago

I tried desugaring LiteralString to a special ast that embedded a pre-constructed stringValue object, to avoid having to create duplicate copies on the heap each time the string literal was executed. However this did not improve performance at all, in fact the profile indicated that makeStringValue was never being called, even though the code was full of string literals. I am not sure why.

sparkprime commented 6 years ago

This one generate unique numbers for variables statically and use that instead of names is going to give us the biggest bang for buck next I think.

I have a hunch that Go's maps are not as fast as the C++ std::map, which is responsible for some of the relative performance gap. We have to use maps for objects (in general) but there's no reason for environments.

dlespiau commented 5 years ago

An interesting observation is the effect of disabling the GC, compiling kube-prometheus. Best of 3 runs:

$ time -J vendor -m manifests kube-prometheus.jsonnet
real    0m6.503s
user    0m8.852s
sys 0m0.352s

$ time GOGC=off jsonnet -J vendor -m manifests kube-prometheus.jsonnet
real    0m5.440s
user    0m4.488s
sys 0m0.952s

It's possible to play with SetGCPercent to tune the GC behaviour, jsonnet is a short lived program and very often doesn't need to garbage collect at all.

abourget commented 5 years ago

I've been toying with this sonnet:

local k = import "k.libsonnet";

local deployment = k.apps.v1beta2.deployment;
local container = deployment.mixin.spec.template.spec.containersType;
local containerPort = container.portsType;

local myAppContainer = container.new("hello", "world");

local myAppDeployment = deployment.new("hello", 2, [myAppContainer], podLabels={"app": "myapp"});

k.core.v1.list.new(myAppDeployment)

It references these two files:

These are the results I get with 3 different implementations:

$ time jsonnet-cpp my.jsonnet
{...}
real    0m0.545s
user    0m0.477s
sys 0m0.047s

$ time jsonnet-go my.jsonnet
{...}
real    0m1.703s
user    0m2.082s
sys 0m0.193s

$ time ./sjsonnet.jar my.jsonnet 
{...}
real    0m1.833s
user    0m0.376s
sys 0m0.045s

and a second run of sjsonnet.jar, as it's calling into a background server (to avoid the JVM boot time):

$ time ./sjsonnet.jar my.jsonnet 
{...}
real    0m0.232s
user    0m0.327s
sys 0m0.029s

Seems the Scala version managed to get some interesting boost here.

My stack is in Go, so I'd love so much to have a speedy version here :)

As a side note, I'm interested in go-jsonnet as a real-time JSON stream reshaper.. disabling the garbage collecting in this situation isn't really practical.

sbarzowski commented 5 years ago

It looks like in this case it's parsing performance that really matters in this case. I don't think garbage collection is a problem at all here.

EDIT: I ran a quick experiment and just processing k8s.libsonnet takes pretty much the same amount of time as the provided example (the difference is well within random fluctuations).

ghostsquad commented 4 years ago

I'd like to contribute to this effort. I had a couple questions though to help me get oriented.

@sbarzowski requests that benchmarking is setup so that we can track performance changes over time. How would one write a benchmark test for a stdlib function that's written in jsonnet directly (instead of in Go)?
How would someone "overwrite" a stdlib function that exists in the upstream stdlib jsonnet file (so that it runs native code instead of jsonnet interpreted)?

sbarzowski commented 4 years ago

That's awesome!

1) Actually I think it's best to keep the benchmarks completely implementation-agnostic, we shouldn't care if it's Go, Jsonnet or C++ - we want to compare the implementations. You can just create a Jsonnet file on which the performance improvement is visible here https://github.com/google/jsonnet/tree/master/benchmarks. Ideally we would have more benchmarks, including some realistic ones and a script which runs them all and prepares a nice looking report, allowing for easy comparison of performance across versions. But we don't have that yet, and frankly it's not critical, because there is still a lot of low hanging fruit performance-wise and the improvements tend to be obvious. So, don't worry too much about it, adding an example for which the improvement is clearly visible is enough for stuff like adding a builtin implementation of a stdlib function. 2) All the builtins in the builtin table will override the ones defined in std.jsonnet. It's enough to define a new function implementing a builtin and add it to the table there. For reference the code which builds the std object and performs the overriding is here: https://github.com/google/go-jsonnet/blob/v0.14.0/interpreter.go#L1098.

If you have any questions or need help with anything, please let me know!

ghostsquad commented 1 year ago

sjsonnet has some very good benchmarks because it caches values once they are evaluated as outlined here: https://github.com/databricks/sjsonnet#performance

Additionally, they moved all the native functions into Scala, which I think would also be very worthwhile. I started doing that to some of the more complex functions already.

ben-manes commented 5 months ago

jrsonnet brought an evaluation down from 145s in go-jsonnet to a mere 0.35s (https://github.com/CertainLach/jrsonnet/issues/156). The bottleneck from a pprof (svg) was probably flattenArrays and garbage collection. The author rewrote std.prune to drop from the initial 12s runtime, so that's likely a bottleneck here too. It is such a stark difference in runtimes for the same output.

ghostsquad commented 5 months ago

@ben-manes ya, it's clear that functions written in jsonnet are going to going to be the cause of many performance bottlenecks.

google / go-jsonnet

Execution performance ideas #111