Proposal: Reduce short living allocations

A-Herzog commented 4 years ago

The implicit memory management of Java allows to implement algorithms very straight forward but relying too much on this concept can result in very heavy garbage collector loads.

In my JS workloads I can see (using Flight Recorder) there are two lines of code in Rhino which result in 90 GB of short living allocations per minute. For both cases I would like to suggest improvements which will reduce these allocations to zero (and therefore reduce the time needed to collect them in gc to zero). The disadvantage is, that in both cases some additional code is necessary, so the clearness of the methods is reduced.

org.mozilla.javascript.ScriptRuntime

When executing an "i++" in JS the previously maybe as Integer stored "i" becomes a Double. (The increment method tests if i is a Number and in this case Number.doubleValue() is used.) Boxing small integer values does not need any allocation because Integer.valueOf(int) is using a cache. Double.valueOf(double) always needs to allocate a Double object, so a simple for (var i=0;i<10;i++) results in 10 Double allocations. Even more: When creating a JS variable by "var i=3;" it will be stored as an Integer. But the "int i=0" in the for loop generates a Double - for some unknown reasons. So "i++" will operate on Double all the time.

By using the following code in ScriptRuntime the unnecessary use of Double variables can be avoided. (Please note the "Added code starts/ends here" markers. Everything else is not changed.)

private static Object doScriptableIncrDecr(Scriptable target,
        String id,
        Scriptable protoChainStart,
        Object value,
        int incrDecrMask)
{
    boolean post = ((incrDecrMask & Node.POST_FLAG) != 0);

    // Added code starts here
    if (value instanceof Integer) {
        int intNumber = ((Integer)value).intValue();
        if ((incrDecrMask & Node.DECR_FLAG) == 0) {
            ++intNumber;
        } else {
            --intNumber;
        }
        final Integer intResult = wrapInt(intNumber);
        target.put(id, protoChainStart, intResult);
        return post?value:intResult;
    }

    if (value instanceof Double) {
        final double d=((Double)value).doubleValue();
        if (Math.floor(d)==d && Math.abs(d)<NativeNumber.MAX_SAFE_INTEGER) {
            int intNumber = (int)d;
            if ((incrDecrMask & Node.DECR_FLAG) == 0) {
                ++intNumber;
            } else {
                --intNumber;
            }
            final Integer intResult = wrapInt(intNumber);
            target.put(id, protoChainStart, intResult);
            return post?value:intResult;
        }
    }
    // Added code ends here

    double number;
    if (value instanceof Number) {
        number = ((Number)value).doubleValue();
    } else {
        number = toNumber(value);
        if (post) {
            // convert result to number
            value = wrapNumber(number);
        }
    }
    if ((incrDecrMask & Node.DECR_FLAG) == 0) {
        ++number;
    } else {
        --number;
    }
    Number result = wrapNumber(number);
    target.put(id, protoChainStart, result);
    if (post) {
        return value;
    }
    return result;
}

org.mozilla.javascript.optimizer.OptRuntime

Here we have a static method for boxing a function argument to an one-element array:

public static Object call1(..., Object arg0,...) {  
    return fun.call(..., new Object[] { arg0 } );  
}

Avoiding these Object[1] allocations is a bit more complex because "call1" is static and we need to ensure the code keeps thread safe. So for this case I would propose this:

private final static ThreadLocal<Object[]> call1obj=new ThreadLocal<>();

/**
 * Implement ....(arg) call shrinking optimizer code.
 */
public static Object call1(Callable fun, Scriptable thisObj, Object arg0,
        Context cx, Scriptable scope)
{
    Object[] args=call1obj.get();
    if (args==null) call1obj.set(args=new Object[1]);

    args[0]=arg0;
    try {
        return fun.call(cx, scope, thisObj, args );
    } finally {
        args[0]=null; /* Help the gc */
    }
}

Measurement results

The following two Java Mission Control screenshots show the difference in memory allocations using the current Rhino code and the current Rhino code with the two additions above:

Current Rhino code FlightRecorder-Memory-Rhino-current-version Rhino using the two additions FlightRecorder-Memory-Rhino-on-memoy-diet

If "keeping the code simple and maintainable" is most important, it's ok to discard my proposals. But if a reduction of gc load is of interest, this may help. (BTW: The improved Rhino version would be less gc intensive than Nashorn.)

rbri commented 4 years ago

Hi Alexander, i think this is a really interesting suggestion. Just made a pull request based on your first suggestion to have some code base to discuss. I guess Greg will also have some suggestions here.

rbri commented 4 years ago

Regarding your second suggestion i'm not really sure your fix is correct. Without a deeper look at the code i have some fear if one function calls another one and then again addresses the args array. And i personally think thread locals are more often the root of problems and not the solution ;-). Will have a deeper look at this later.

Many, many thanks for your contribution.

p-bakker commented 3 years ago

@rbri assuming the PR you made for the first part of this proposal got merged, can this issue be closed?

I think you're right that the 2nd part of the proposal has issues when static call1(...) method gets called multiple times within the same call stack, as a new call to call1 would modify the argument array for a previous call

tuchida commented 3 years ago

      if ((incrDecrMask & Node.DECR_FLAG) == 0) {
          ++intNumber;
      } else {
          --intNumber;
      }
What happens when intNumber is Integer.MAX_VALUE or Integer.MIN_VALUE?

rbri commented 3 years ago

The changes i have done in this pr had no positive effect - i spend some time on this and i found no way to improve the overall performance so far. So the PR was done to preserve my initial work and as starting point for other (or if i have some time to come back to this)

p-bakker commented 3 years ago

Link to the PR that was closed without merging: https://github.com/mozilla/rhino/pull/720

rbri commented 3 years ago

Strange maybe i have never pushed the PR because there was no effect.

p-bakker commented 3 years ago

I had just found it in the Closed PR section: #720

gbrail commented 3 months ago

https://github.com/mozilla/rhino/pull/1563 does indeed do the first thing, which is to make the increment operation (and many other math operations) use Integers, and use Integer.valueOf, much more aggressively. Merging that will partially resolve this issue.

On the second one, I too am concerned about the complexity and possible issues with replacing those small arrays with thread local arrays. I also know that Java is highly optimized for small, short-lived object allocations and I'd love to see an actual performance improvement (like a benchmark running faster) before assuming that this would improve performance.

rbri commented 3 months ago

The changes i have done in this pr had no positive effect - i spend some time on this and i found no way to improve the overall performance so far. So the PR was done to preserve my initial work and as starting point for other (or if i have some time to come back to this)

For me there was no effect at all - i guess this is the wrong way if you like to make the thing faster and using less resources.

p-bakker commented 3 months ago

For me there was no effect at all

Performance impact of these changes could imho really depends on the machine you're running on and the Java version & config being used, as I think any slowdown would be because of the garbage collector kicking in and consuming CPU cycles.

Given a big enough machine and the right garbage collector, you might not notice a slowdown with the current code in a testrun at all

mozilla / rhino