Closed 5225225 closed 2 years ago
fantastic, thank you for the contribution! sorry for the delay, i wanted to make sure i had time to double-check if there would be performance impact (as unexpected as it would be).
it turns out that there was some; the in-tree benchmark was around 3.5% slower after applying dba409c
. objdump
and diff
got me to what i think was the root of it pretty quickly, and adding #[inline(always)]
on both helpers makes benchmarks build with identical code with or without these changes. i wrote up what i noticed and why i think this makes sense in the commit i added here, e22d87c
.
sweet, thanks!
Interesting, so the codegen was changed despite the functions seemingly being inlined anyways?
yeah, i'm not super familiar with how rustc would compile this but i think with the inline annotation, inlining happens earlier in the process (in MIR?). if it's left to inline later, the call might keep some value alive longer than necessary - maybe related having to pass the index as a distinct argument? so that seems to have left a value alive longer than necessary after inlining, then rdi
remains "used" where it was free before, and llvm is forced to pick rbp
with the slightly larger instructions to encode its use.
i ran into a bunch of this for the AnnotatingDecoder
stuff too, but worse. for perf-sensitive users there's a no-op DescriptionSink
so all those calls should be free, but just having calls present - even though they're removed later - tripped up some inlining thresholds and significantly degraded throughput. i suspect yaxpeax-x86 is quite far off from the shape of code llvm's heuristics anticipate so this kinda just.. happens. but i don't have it in me to ... yak... shave that far down :)
.. actually, this is really strong evidence that i should spend more time thinking about how to conserve code size, if an extra 32 bytes of code was so impactful. neat!
The use of
get_kinda_unchecked
as a name is needed for the slice indexing, since inherent impls seem to shadow trait impls. Which is fine.For the unreachable_unchecked, I just import the safer variant on top of it, which keeps the diff size down.
Closes https://github.com/iximeow/yaxpeax-x86/issues/16 (?)