velovix / gopherboy

A Game Boy emulator written in Go
13 stars 1 forks source link

Does GO inline function calls for... #11

Open tilkinsc opened 5 years ago

tilkinsc commented 5 years ago

Does GO inline function calls for things like,

'setHalfCarryFlag' functions?

I wonder if an optimization could fruit from having a leaf function for each.]

Relevant: https://lemire.me/blog/2017/09/05/go-does-not-inline-functions-when-it-should/ https://www.reddit.com/r/golang/comments/6ypwui/go_does_not_inline_functions_when_it_should/ https://groups.google.com/forum/#!topic/golang-nuts/V_xI29FGDZM

As per the last one it might be obvious due to function branching for a one-switch-all format for a function to be used. You can easily double your speed as lemire.me blog said by manually inlining some things that aren't.

velovix commented 5 years ago

Sorry that I missed this! From what I can tell, these flag setting functions are being inlined. If we build the gameboy package with the following command:

go build -gcflags -m

We can see messages like:

./state.go:169:6: can inline (*State).setHalfCarryFlag

I like your thinking though and I've also been on the lookout for function call overhead. Thanks for looking into this!

tilkinsc commented 5 years ago

It was just something that came to mind. I fear that you shouldn't use function calls for this and manually do it yourself as carry flags and other flags have a definite setting per instruction. The trade off is maintainability perhaps. I am not sure the quality of inlining that goes on to be honest.

Poor inlining would be literally putting the function there. Good inlining is LTO-ish optimization post inline.

In my gameboy emulator written in C, I always count on LTO and -O2 to do their jobs. Although, yours is more developed.

One giant advantage of how I built my emulator is it is modular in the point to not optimizing how it works. This is like PPSSPP does with JIT and non-JIT runtimes. The other is my legendary source viewer. I have the best GB rom viewer in history all made by me.

velovix commented 5 years ago

Possibly, I can't speak to the quality of the inlining either. This would be fairly easy to benchmark though, I'll take a look!

I'm curious about this modular design! Is your code publicly available yet? I'd love to take a look!

tilkinsc commented 5 years ago

No the code isn't publicly available nor is it finished.

The modular design just allows me to switch between code parsing targetings. There is room for a JIT, but I use my stupid optimizer. When I was saying it was a great advantage, it is like software rendering versus hardware rendering. Software rendering will get it right everytime, but hardware rendering may need some hacks to get around differences.

velovix commented 5 years ago

I'm not sure if I entirely understand. Are you parsing the Game Boy instructions into an intermediate representation that can be interpreted by a JIT or non-JIT run time?

tilkinsc commented 5 years ago

Yes. Due to the scheme there should be no problem passing instructions through JIT processing. However, I can swap them out at leasiure. I haven't implemented JIT. Just some optimizations I threw on top of it. Such as cached functions who are always popping out the same values. They will continue to do so until some associated memory has changed. Therefore I would invalidate the cache. The way I designed it is to not actually check for memory changes each frame, moreso that the cached gets popped on set. Thank function pointers.