draft PR of a few things that substantially improved decode throughput on both my zen2 machine and a cascade lake machine i compared against.
important bits are making a fast path for the most common opcode decoding routes (optional rex + opcode byte). there's a branch that turns into a LUT that helped a bit too.
this breaks the annotation reporting code because offsets are wrong. panics in debug mode and everything. i need to port this code to 32- and 16-bit modes before merging it too. but i want to show it off so here it is :)
draft PR of a few things that substantially improved decode throughput on both my zen2 machine and a cascade lake machine i compared against.
important bits are making a fast path for the most common opcode decoding routes (optional rex + opcode byte). there's a branch that turns into a LUT that helped a bit too.
this breaks the annotation reporting code because offsets are wrong. panics in debug mode and everything. i need to port this code to 32- and 16-bit modes before merging it too. but i want to show it off so here it is :)