Closed dgryski closed 6 years ago
So here's the boring, abridged backstory:
A few years back, I was reading various books about the Linux kernel. One of these books was particularly heavy on assembly, and specifically mentioned this pattern of landing a JMP
on a NOP
if the next instruction happens to be a CALL
.
It looked so odd to me that I've kept a vivid memory of it for all this time; and when I saw this same pattern again in the output of the Go compiler, it ticked immediately.
Unfortunately I cannot recall the name of this book for the life of me, nor the reason that was given for this pattern to exist. Googling for tricky usages of NOP instructions mostly redirects to security-related stuff, and I don't remember this being related to security. But then again, I've got no source at all to back this up, and I might as well be talking complete non-sense here, so...
Anyway, I was kinda hoping that someone with better assembly skills than me would be able to shed some light on this.
Now, on the bright side, your question made me go back into the code to look for more clues.
Looking back at the code, I've noticed that the NOP
instruction that's inserted by the compiler is marked as being 0 byte instead of the 1-byte instruction you'd expect it to be:
0x003a NOP ;; 0x3a
0x003a CALL runtime.morestack_noctxt(SB) ;; 0x3a too
Now this is just some abstract assembly that can and will be modified by the linker in many ways, but still, this looks odd. So I went a bit deeper.
Grepping through the codebase in search of weird-looking NOP
instructions, I stumbled upon this:
// The NOP is needed to give the jumps somewhere to land.
// It is a liblink NOP, not an ARM64 NOP: it encodes to 0 instruction bytes.
q = q1
Whose description sounds very much like what we're looking at. We do need somewhere to land, and we seem to be 0 bytes...
Maybe that's a start?
In the end, this might not be related at all to what made me write this in the first place. Heh.
Sorry I cannot help you more here. Thanks for the great question though :)
This does not seem to be a generic "NOP-before-CALL" situation but rather a fix up for the stacksplit epilogue to maintain the correct stack pointer adjustment (for debugging purposes only it seems):
// Now we are at the end of the function, but logically
// we are still in function prologue. We need to fix the
// SP data and PCDATA.
spfix := obj.Appendp(last, newprog)
spfix.As = obj.ANOP
spfix.Spadj = -framesize
as you have observed, NOP does not map to a machine code NOP but rather is simply ignored when generating machine code. But the Spadj
value is used so that current SP adjustment value is correct.
Also note that on certain architectures, the stacksplit check was in the prologue with a jump to skip the check so such fix-up is unnecessary. (i.e. instead of "jump to stacksplit (in epilogue) if not enough space", the emitted code reads "jump over stacksplit (in prologue) if having enough space"). e.g. when compiling with GOOS=linux GOARCH=mips
the following is generated:
MOVW R31, R3
CALL runtime.morestack_noctxt(SB) ; no NOP before CALL
As for Spadj
, it seems it is used to maintain a mapping between PC and corresponding SP adjustment at the PC. [1][2] You can access this information through go tool compile -d pctab=pctospadj
. (note that for the example provided in the chapter, framesize happen to be 0 so the effect of this fix-up is not noticable.)
P.S. The "doing so can lead to very dark places" case you are thinking of might be referring to the practice of appending NOPs after any branching instruction for architectures that have branch delay slots as oppose to this case.
Thanks for the explanation as well as the links @zliuva, it all makes perfect sense once you've read that code! I was so focused on my CALL
backstory that I didn't even think to look at the implementation on the compiler's part, heh.
Also I wasn't aware of that -d
flag, it's a real gold mine; definitely gonna help a ton with future chapters.
Anyway, mystery solved, then. This indeed has nothing to do with the CALL
.
I do remember reading about delay slots a few years back, so that might be it yes; I'll have to dig further.
Thanks again for your pointers!
I've added a link to this discussion in the relevant part of chapter 1; and will be closing this now. Don't hesitate to add more comments even if it's closed!
Hopefully I'll take the time to update chapter 1 with all these learnings some day.
The section
has generated some questions on reddit and #performance on the Gophers Slack. Could you please expand on this more?