Inlines backup2x8 which is inlined in C and improves performance slightly.
The f.a[t.a] block context reference is constant throughout decode_b, but it appears that the
function is too complex for the optimizer to not recompute this reference. Making it a local
improves performance measurably (~1% on a Ryzen 7700X for 8-bit Chimera).
Inlines
backup2x8
which is inlined in C and improves performance slightly.The
f.a[t.a]
block context reference is constant throughoutdecode_b
, but it appears that the function is too complex for the optimizer to not recompute this reference. Making it a local improves performance measurably (~1% on a Ryzen 7700X for 8-bit Chimera).