iximeow / yaxpeax-x86

x86 decoders for the yaxpeax project
BSD Zero Clause License
129 stars 23 forks source link

Wrap unsafe functions to catch errors in debug #17

Closed 5225225 closed 2 years ago

5225225 commented 2 years ago

The use of get_kinda_unchecked as a name is needed for the slice indexing, since inherent impls seem to shadow trait impls. Which is fine.

For the unreachable_unchecked, I just import the safer variant on top of it, which keeps the diff size down.

Closes https://github.com/iximeow/yaxpeax-x86/issues/16 (?)

iximeow commented 2 years ago

fantastic, thank you for the contribution! sorry for the delay, i wanted to make sure i had time to double-check if there would be performance impact (as unexpected as it would be).

it turns out that there was some; the in-tree benchmark was around 3.5% slower after applying dba409c. objdump and diff got me to what i think was the root of it pretty quickly, and adding #[inline(always)] on both helpers makes benchmarks build with identical code with or without these changes. i wrote up what i noticed and why i think this makes sense in the commit i added here, e22d87c.

5225225 commented 2 years ago

sweet, thanks!

Interesting, so the codegen was changed despite the functions seemingly being inlined anyways?

iximeow commented 2 years ago

yeah, i'm not super familiar with how rustc would compile this but i think with the inline annotation, inlining happens earlier in the process (in MIR?). if it's left to inline later, the call might keep some value alive longer than necessary - maybe related having to pass the index as a distinct argument? so that seems to have left a value alive longer than necessary after inlining, then rdi remains "used" where it was free before, and llvm is forced to pick rbp with the slightly larger instructions to encode its use.

i ran into a bunch of this for the AnnotatingDecoder stuff too, but worse. for perf-sensitive users there's a no-op DescriptionSink so all those calls should be free, but just having calls present - even though they're removed later - tripped up some inlining thresholds and significantly degraded throughput. i suspect yaxpeax-x86 is quite far off from the shape of code llvm's heuristics anticipate so this kinda just.. happens. but i don't have it in me to ... yak... shave that far down :)

iximeow commented 2 years ago

.. actually, this is really strong evidence that i should spend more time thinking about how to conserve code size, if an extra 32 bytes of code was so impactful. neat!