hpi-swa / trufflesqueak

A Squeak/Smalltalk VM and Polyglot Programming Environment for the GraalVM.
MIT License
283 stars 14 forks source link

Maintain the entire stack in frame slots #155

Closed fniephaus closed 1 year ago

fniephaus commented 2 years ago

At the moment, TruffleSqueak uses the frame arguments and frame slots for the stack, which avoids copying from the args to the slots. This additional copying caused measurable overhead and so did the additional read/write slot operations. With Truffle's new indexed slots, the latter overhead should be reduced significantly. While copying will introduce some overheads again, we probably gain peak performance improvements (the frame arguments array is not escape-analyzed) and we can start using the BytecodeOSRNode. Apparently, the BytecodeOSRNode in Sulong does not impact peak performance as much as the unstructured-to-structured cfg conversion to LoopNodes.

smarr commented 2 years ago

In Ruby, they added some hacky "in compiled code" check, which copies the argument array to facilitate escape analysis.

There was also talk about a compiler intrinsic to tell it that the argument array identity is not needed, which would help escape analysis.

Either of those options seems better than then copying args into frame slots :)

fniephaus commented 2 years ago

In Ruby, they added some hacky "in compiled code" check, which copies the argument array to facilitate escape analysis.

You are probably referring to this. Yes, that's hacky.

There was also talk about a compiler intrinsic to tell it that the argument array identity is not needed, which would help escape analysis.

I am not aware of any talks about such a compiler intrinsic but I'd certainly like to try something like that out.

Either of those options seems better than then copying args into frame slots :)

While copying args into frame slots is not ideal, it "only" seems to make a perf difference in the interpreter. Compiled code can apparently be even better compared with what we have now, so some peak perf improvements can be expected.

Also, copying args into frame slots seems to be a requirement for using the BytecodeOSRNode, which would be nice to have at some point.

fniephaus commented 2 years ago

It seems we no longer need to maintain the entire stack in frame slots to allow for bytecode OSR (see https://github.com/oracle/graal/commit/0499bfa9f45001e7087a70f09d64db1ff8fb740a). I've pushed a first sketch in https://github.com/hpi-swa/trufflesqueak/commit/4d4b2c5fc66b07ef346b889f798d1fc09238cfd3.

fniephaus commented 1 year ago

As of https://github.com/hpi-swa/trufflesqueak/commit/5ae54a4a57a4e780bab1b31849c5e5846a12769d, OSR is enabled in TruffleSqueak.