kosarev / z80

Fast and flexible Z80/i8080 emulator with C++ and Python APIs
MIT License
65 stars 10 forks source link

Performance benefit using fast types with registers? #19

Closed simonowen closed 3 years ago

simonowen commented 3 years ago

I'm keeping an execution trace in my emulator debugger, which stores the z80_state structure to preserve the CPU state after each instruction. I noticed this was quite a lot bigger than the old CPU core, with 76 bytes needed by default. It looks like the extra size is due to the 16-bit register pairs being stored as 32-bit values, with the fast_u16 type being uint_fast16_t, which is unsigned int in my environment.

I experimented with using uint_least16_t instead, for a real 16-bit value. That shrinks the state to 44 bytes, and I didn't notice any difference in performance for either x86 or x64 builds. Were the fast versions used because of a performance difference seen in some cases, or just because they should give the best performance for any environment?

On a related note, while experimenting with changing fast_u32 to uint_least32_t I noticed that the pf_ari template has a small type mismatch in two uses. Changing them from pf_ari(r32 to pf_ari(r16 should fix it I think.

kosarev commented 3 years ago

or just because they should give the best performance for any environment?

Yes, that was the idea. So the standard modules don't try to be the best fit for all uses, but rather a default/generic/reference implementation that is just good enough to start with. Whenever the user wants their own custom layout, they implement it like we do in, e.g., https://github.com/kosarev/z80/blob/77cf0a74749bcd341e9ef9c87f96ad324e3460a3/z80/machine.inc#L11

Regarding the fast_uN types, they initially were the standard uint_fastN_t ones. The problem with the standard types was that gcc defines uint_fast8_t to be unsigned char, which required lots of extra care because of integer promotions. So we essentially guarantee that fast_uN are never narrower than unsigned. I guess that may explain the warnings for the pf_ari() calls?

simonowen commented 3 years ago

Thanks for the details. I'll stick with the default storage for now, which should give the best performance for each build environment.

I was surprised it was only the two pf_ari calls that were complaining after my type change. The other pf_ functions were already getting 3 matched types.