Feedback on part 9 – performance benchmarking

eregon commented 2 years ago

Hello,

I've read https://www.endoflineblog.com/graal-truffle-tutorial-part-9-performance-benchmarking and wanted to leave some feedback. It is a nice write up, thank you for these nice blog posts. I'm sure they help people getting started with Truffle.

I noticed a few things which are rather unexpected, based on my experience with Truffle (I work on TruffleRuby).

The first thing is the benchmark redefines the function every time. That seems unexpected, especially for micro-benchmarks. Here it seems much better to declare the function in the setup, and only call fib inside the benchmark itself. Also calling eval like this is likely to have a non-trivial overhead. Better would be to Value fib; as a field, fib = context.eval("ezs", "function fib(n) {...}") in setup and then benchmark fib.execute(20). Another possibility would be to store "fib(20)" as a Source and pass that to eval.

The mutable FunctionObject approach feels quite off for JS-like languages. I guess the inspiration is SLFunction? IMHO it's kind of a hack and it only works because in SL there are only functions at the top-level, and so a single map of functions conceptually, and the functions objects are used both for caching looking and holding on the CallTarget. In a JS-like language, typically the function/property lookup would be cached separately (using the Truffle Object Model), and the function itself would have a fixed CallTarget. Related, the caching in GlobalVarReferenceExprNode seems only correct if it's only functions that can be assigned at the top-level and not other variables. Otherwise, this would cache fib to the given function even after fib = 3 at the top-level (which cannot really be handled by the mutable FunctionObject).

The change in BlockStmtNode should not change peak performance (although it is a good change in general), Graal should compile it the same no matter which version is used, because it can see ret is unused except the last one after it exploded/unrolled the loop. Do you have a way to reproduce that? If the performance actually differs it would likely be a bug. Possibly the difference you saw could be caused by not enough warmup. cc @rschatz

skinny85 commented 2 years ago

Hello Benoit! First of all, thanks a lot for taking the time to leave feedback on the article - I really appreciate it!

The first thing is the benchmark redefines the function every time. That seems unexpected, especially for micro-benchmarks. Here it seems much better to declare the function in the setup, and only call fib inside the benchmark itself. Also calling eval like this is likely to have a non-trivial overhead. Better would be to Value fib; as a field, fib = context.eval("ezs", "function fib(n) {...}") in setup and then benchmark fib.execute(20). Another possibility would be to store "fib(20)" as a Source and pass that to eval.

I agree it's a little weird to re-define the function each time. The reason I did it that way is because it's the simplest solution. These articles tend to get long, and I look for opportunities to simplify wherever I can. Since all benchmarks of the Truffle languages were written in the same way, I assumed it wouldn't hurt their numbers relative to each other (it probably hurts their numbers relative to Java, but I don't think that's a huge deal). Also, redefining the function actually had some positive benefits too - it allowed me to discover inefficiency in FuncDeclStmtNode (we recreated the CallTarget each time it was called, instead of caching the first one)!

Given that, what do you think of leaving the code as-is, but mentioning in the body of the blog article the other options that you gave, like using a Value for fib, or a Source for fib(20)?

The mutable FunctionObject approach feels quite off for JS-like languages. I guess the inspiration is SLFunction? IMHO it's kind of a hack and it only works because in SL there are only functions at the top-level, and so a single map of functions conceptually, and the functions objects are used both for caching looking and holding on the CallTarget. In a JS-like language, typically the function/property lookup would be cached separately (using the Truffle Object Model), and the function itself would have a fixed CallTarget.

Yes, you are correct that the mutable FunctionObject is inspired by SimpleLanguage. Remember that we also only support functions on the top level in EasyScript at this point, because we don't support closures (this is explained in part 7 that introduces user-defined functions). You are also correct that references to functions should be implemented with property lookups (on the global object). The problem is that we haven't introduced properties to EasyScript yet, and that's why we can't use them in this part of the series.

Hopefully this now makes more sense why things are implemented the way they are at this point in the series!

Related, the caching in GlobalVarReferenceExprNode seems only correct if it's only functions that can be assigned at the top-level and not other variables. Otherwise, this would cache fib to the given function even after fib = 3 at the top-level (which cannot really be handled by the mutable FunctionObject).

Yes, you are absolutely correct, this is a miss on my part. I will fix that.

The change in BlockStmtNode should not change peak performance (although it is a good change in general), Graal should compile it the same no matter which version is used, because it can see ret is unused except the last one after it exploded/unrolled the loop. Do you have a way to reproduce that? If the performance actually differs it would likely be a bug. Possibly the difference you saw could be caused by not enough warmup. cc @rschatz

It's very noticeable when running the benchmark in the "slow" version (with the code from part 8) - that's where the 3x difference comes from. But it's hard to say whether the problems with the other parts of the code are not indirectly to blame here.

When running the benchmark on the final code from part 9, I pretty consistently get around a 5% slowdown when reverting the changes to BlockStmtNode. But that's a small enough difference that it could be because of noise on my laptop.

Feel free to investigate with the code from part 9, and see if you can also reproduce the slight performance hit with the implementation of BlockStmtNode from part 8.

Thanks again for the feedback on the article!

eregon commented 2 years ago

Given that, what do you think of leaving the code as-is, but mentioning in the body of the blog article the other options that you gave, like using a Value for fib, or a Source for fib(20)?

Yes, I think that would be helpful. Using Source for the function definition would also be an improvement (if you don't mind redefinition) and avoid some extra Source object creation and map lookup.

skinny85 commented 1 year ago

I've incorporated your feedback Benoit (https://github.com/skinny85/eol-blog/commit/cfb08fdee5b82057bc30314e86be24e6f911cc74), thanks again for making the time to submit it!

I'll be resolving the issue, feel free to comment if there's anything else needed to be done for it 🙂.

Thanks, Adam

skinny85 / graalvm-truffle-tutorial

Feedback on part 9 – performance benchmarking #2