PR on github for visibility, but i'll be merging this shortly after clicking Create pull request.
changes described in the changelog:
* optimizations (mostly code motion) for hot codepaths
- large `match`-based decode tables have been outlined to 256-entry arrays.
this makes for slicely nicer inlining in `read_with_annotations`.
- vex/evex decoding in 64-bit decoding now shares more code. this seems to
aid code cache friendliness when prefixes must be read.
- added a fast path for operand reading for the more-likely cases of
[64-bit]: {0x66,rex}{<opcode>,0x0f-<opcode>}
[32-bit]: {0x66}{<opcode>,0x0f-<opcode>}
[16-bit]: {0x66}{<opcode>,0x0f-<opcode>}
in particular, this avoids checking for instruction length overflows and
some bounds checks when we aren't handling a pessimal case of many-prefixed
instructions. if an instruction has multiple prefixes, decoders fall back
to normal read-in-a-loop-until-length-limit-reached decoding.
i'd actually these were useful optimizations for the 64-bit decoder early in the year, but became increasingly encumbered with "one more thing" to the point that i'd never landed them. i'd also, in the process, forgotten to actually publish yaxpeax-x86 1.1.5. so i'm cleaning up this long-outstanding work, will merge, then publish shortly after.
i'm not actually sure if these optimizations help as much in the 32-bit or 16-bit decoders. the LUT-for-bank-lookup change almost certainly does not. others, like a fast path to bypass the decode loop, probably do help a bit. i have not measured these and do not plan to. my priority for 32-bit and 16-bit decoders is to keep them substantially similar to the 64-bit decoder, as i'm optimistic this substantially-similar code can be written with less... almost-duplication..
PR on github for visibility, but i'll be merging this shortly after clicking
Create pull request
.changes described in the changelog:
i'd actually these were useful optimizations for the 64-bit decoder early in the year, but became increasingly encumbered with "one more thing" to the point that i'd never landed them. i'd also, in the process, forgotten to actually publish
yaxpeax-x86 1.1.5
. so i'm cleaning up this long-outstanding work, will merge, then publish shortly after.i'm not actually sure if these optimizations help as much in the 32-bit or 16-bit decoders. the LUT-for-bank-lookup change almost certainly does not. others, like a fast path to bypass the decode loop, probably do help a bit. i have not measured these and do not plan to. my priority for 32-bit and 16-bit decoders is to keep them substantially similar to the 64-bit decoder, as i'm optimistic this substantially-similar code can be written with less... almost-duplication..