Even simple loops don't get unrolled in high/mid/low tiers

jagotu commented 2 years ago

While checking the reason for lower performance of some benchmarks, I noticed that if loops remain after TruffleTier (which they seem to do as TruffleTier just peels the bytecode loop), no later tiers unroll them even though form the graphs should fullfill rules for the LoopFullUnrollPhase that runs in all three of these tiers. This prevents other optimizations and constant folding that could be achieved by unrolling the loop.

For example, this procedure:

static int simpletest()
{
    int result = 0;
    for (int i = 0; i < 5; i++)
    {
        result += i;
    }
    return result;
}

Copmiles to this graph after TruffleTier:

loop2

And the LoopNode survives even after low tier:

loop6

I've tried to debug the LoopFullUnrollPhase and it seems the detection of counted loops happens only once, when the loop is still too complex to be recognized as counted. It could very well be a bug/imperfection in truffle/graal itself, but I don't currently have the time to investigate.

Fixing this and making sure that the loops are unrolled should provide a significant boost to the slowest of benchmarks, like nbody, which loops over an array of 6 plantes and none of the virtual calls can be constantized due to the lack of unrolling.

steve-s commented 2 years ago

Quick guess: the compiler is missing the loop condition probability/loop frequency and so doesn't know if it can be beneficial to peel/etc. I don't know on top of my head if Espresso does this? The low level facility to tell the compiler about probability of a condition is CompilerDirectives.injectBranchProbability(the_condition), unless the probability is known upfront you have to count it with some counters, this is what ConditionProfile.createCountingProfile() does internally, but for compact bytecode interpreter such as Espress/BACIL it may be better to avoid the ConditionProfile abstraction and use the low level injectBranchProbability directly.

steve-s commented 2 years ago

I don't know if seafoam shows branch probabilities, but that information should be in the IGV format output from the compiler. There's nothing like that in the image, so maybe it is indeed missing (or seafoam does not show it...)

jagotu commented 2 years ago

Espresso does some kind of "LivenessAnalysis" which seems to be related to loop tracking. I expected trivial loops to always be beneficial to peel, but it being driven by some branching statistics would explain the issue as BACIL doesn't do any tracking.

steve-s commented 2 years ago

For future reference LivenessAnalysis AFAIK is done in order to reduce the number of "live" state, i.e., state that the runtime needs to keep track of in order to be able to deoptimize the compiled code. Espresso eagerly sets slots in the local variables array to null as early as possible to reduce the "live" state.

jagotu / BACIL

Even simple loops don't get unrolled in high/mid/low tiers #4