teh-cmc / go-internals

A book about the internals of the Go programming language.
Other
7.83k stars 352 forks source link

chapter1: Clarify "nop before call" paragraph #4

Closed dgryski closed 6 years ago

dgryski commented 6 years ago

The section

The NOP instruction just before the CALL exists so that the prologue doesn't jump directly onto a CALL instruction. On some platforms, doing so can lead to very dark places; it's common pratice to set-up a noop instruction right before the actual call and land on this NOP instead.

has generated some questions on reddit and #performance on the Gophers Slack. Could you please expand on this more?

teh-cmc commented 6 years ago

So here's the boring, abridged backstory: A few years back, I was reading various books about the Linux kernel. One of these books was particularly heavy on assembly, and specifically mentioned this pattern of landing a JMP on a NOP if the next instruction happens to be a CALL.
It looked so odd to me that I've kept a vivid memory of it for all this time; and when I saw this same pattern again in the output of the Go compiler, it ticked immediately.

Unfortunately I cannot recall the name of this book for the life of me, nor the reason that was given for this pattern to exist. Googling for tricky usages of NOP instructions mostly redirects to security-related stuff, and I don't remember this being related to security. But then again, I've got no source at all to back this up, and I might as well be talking complete non-sense here, so...

Anyway, I was kinda hoping that someone with better assembly skills than me would be able to shed some light on this.

Now, on the bright side, your question made me go back into the code to look for more clues.
Looking back at the code, I've noticed that the NOP instruction that's inserted by the compiler is marked as being 0 byte instead of the 1-byte instruction you'd expect it to be:

0x003a NOP                  ;; 0x3a
0x003a CALL runtime.morestack_noctxt(SB)    ;; 0x3a too

Now this is just some abstract assembly that can and will be modified by the linker in many ways, but still, this looks odd. So I went a bit deeper.
Grepping through the codebase in search of weird-looking NOP instructions, I stumbled upon this:

// The NOP is needed to give the jumps somewhere to land.
// It is a liblink NOP, not an ARM64 NOP: it encodes to 0 instruction bytes.
q = q1

Whose description sounds very much like what we're looking at. We do need somewhere to land, and we seem to be 0 bytes...
Maybe that's a start?

In the end, this might not be related at all to what made me write this in the first place. Heh.

Sorry I cannot help you more here. Thanks for the great question though :)

zliuva commented 6 years ago

This does not seem to be a generic "NOP-before-CALL" situation but rather a fix up for the stacksplit epilogue to maintain the correct stack pointer adjustment (for debugging purposes only it seems):

e.g. on x86 and arm64

// Now we are at the end of the function, but logically
// we are still in function prologue. We need to fix the
// SP data and PCDATA.
spfix := obj.Appendp(last, newprog)
spfix.As = obj.ANOP
spfix.Spadj = -framesize

as you have observed, NOP does not map to a machine code NOP but rather is simply ignored when generating machine code. But the Spadj value is used so that current SP adjustment value is correct.

Also note that on certain architectures, the stacksplit check was in the prologue with a jump to skip the check so such fix-up is unnecessary. (i.e. instead of "jump to stacksplit (in epilogue) if not enough space", the emitted code reads "jump over stacksplit (in prologue) if having enough space"). e.g. when compiling with GOOS=linux GOARCH=mips the following is generated:

MOVW    R31, R3
CALL    runtime.morestack_noctxt(SB) ; no NOP before CALL

As for Spadj, it seems it is used to maintain a mapping between PC and corresponding SP adjustment at the PC. [1][2] You can access this information through go tool compile -d pctab=pctospadj. (note that for the example provided in the chapter, framesize happen to be 0 so the effect of this fix-up is not noticable.)

P.S. The "doing so can lead to very dark places" case you are thinking of might be referring to the practice of appending NOPs after any branching instruction for architectures that have branch delay slots as oppose to this case.

teh-cmc commented 6 years ago

Thanks for the explanation as well as the links @zliuva, it all makes perfect sense once you've read that code! I was so focused on my CALL backstory that I didn't even think to look at the implementation on the compiler's part, heh.
Also I wasn't aware of that -d flag, it's a real gold mine; definitely gonna help a ton with future chapters. Anyway, mystery solved, then. This indeed has nothing to do with the CALL.

I do remember reading about delay slots a few years back, so that might be it yes; I'll have to dig further.

Thanks again for your pointers!

teh-cmc commented 6 years ago

I've added a link to this discussion in the relevant part of chapter 1; and will be closing this now. Don't hesitate to add more comments even if it's closed!

Hopefully I'll take the time to update chapter 1 with all these learnings some day.