Change push+ret in set_context to indirect jmp

mrakh commented 2 years ago

This snippet of code in your set_context subroutine:

  pushq %r8
  xorl %eax, %eax
  ret

should be changed to:

  xorl %eax, %eax
  jmp *%r8

And likewise with swap_context.

Modern Intel and AMD CPU microarchitectures have a return stack buffer (RSB) that tracks call and ret invocations so they can speculatively execute past a ret instruction. A mispredicted ret will cause a guaranteed pipeline stall, which will seriously hurt your performance. By contrast, jmp *%r8 is speculated using the indirect branch predictor, which is likely to have a non-zero hit rate.

overloader7 commented 1 year ago

I can confirm that in my tests on i5 650 (of just swapping between two functions on one pinned thread and counting), jmp makes the entire function 50% faster

overloader7 commented 1 year ago

https://blog.stuffedcow.net/2018/04/ras-microbenchmarks/

graphitemaster / fibers

Change push+ret in set_context to indirect jmp #9