Closed simonowen closed 3 years ago
or just because they should give the best performance for any environment?
Yes, that was the idea. So the standard modules don't try to be the best fit for all uses, but rather a default/generic/reference implementation that is just good enough to start with. Whenever the user wants their own custom layout, they implement it like we do in, e.g., https://github.com/kosarev/z80/blob/77cf0a74749bcd341e9ef9c87f96ad324e3460a3/z80/machine.inc#L11
Regarding the fast_uN
types, they initially were the standard uint_fastN_t
ones. The problem with the standard types was that gcc defines uint_fast8_t
to be unsigned char
, which required lots of extra care because of integer promotions. So we essentially guarantee that fast_uN
are never narrower than unsigned
. I guess that may explain the warnings for the pf_ari()
calls?
Thanks for the details. I'll stick with the default storage for now, which should give the best performance for each build environment.
I was surprised it was only the two pf_ari
calls that were complaining after my type change. The other pf_
functions were already getting 3 matched types.
I'm keeping an execution trace in my emulator debugger, which stores the z80_state structure to preserve the CPU state after each instruction. I noticed this was quite a lot bigger than the old CPU core, with 76 bytes needed by default. It looks like the extra size is due to the 16-bit register pairs being stored as 32-bit values, with the fast_u16 type being uint_fast16_t, which is unsigned int in my environment.
I experimented with using uint_least16_t instead, for a real 16-bit value. That shrinks the state to 44 bytes, and I didn't notice any difference in performance for either x86 or x64 builds. Were the fast versions used because of a performance difference seen in some cases, or just because they should give the best performance for any environment?
On a related note, while experimenting with changing fast_u32 to uint_least32_t I noticed that the
pf_ari
template has a small type mismatch in two uses. Changing them frompf_ari(r32
topf_ari(r16
should fix it I think.