kosarev / z80

Fast and flexible Z80/i8080 emulator with C++ and Python APIs
MIT License
65 stars 10 forks source link

B decremented too late in on_block_out() #8

Closed simonowen closed 3 years ago

simonowen commented 3 years ago

The block out instructions decrement B before the port write but the currently implementation does it afterwards. I was finding my 16-bit port writes were all offset by one when using OTDR.

It seems to just need the existing:

        self().on_output_cycle(bc, r);
        bc = sub16(bc, 0x0100);
        fast_u8 s = get_high8(bc);

changing to:

        bc = sub16(bc, 0x0100);
        fast_u8 s = get_high8(bc);
        self().on_output_cycle(bc, r);

The on_block_in implementation is already correct in decrementing B before the port access.

simonowen commented 3 years ago

I'm also a bit suspicious of the self().on_fetch_cycle_extra_1t(); before the (HL) access. Do you have a source for that? I don't have the equivalent in my original CPU core, which matches the timing I see on real hardware.

I'm still debugging the details of my contention but with that extra access I'm seeing the total timing showing as much too high. I suspect it bumps the contention rounding of each port access to the next alignment point.

kosarev commented 3 years ago

Hi Simon,

Indeed the Z80 User Manual reads:

Register B can be used as a byte counter, and its decremented value is placed on the top half (A8 through A15) of the address bus at this time.

Will fix in a minute. As to the extra tick, that same manual says the second fetch cycle is 5 ticks long, so looks correct to me. Thanks for reporting!

kosarev commented 3 years ago

Addressed in ea2f74d, please take a look.

simonowen commented 3 years ago

Thanks for the quick fix!

It's a great CPU emulation and it took me less than half an hour (and about 100 lines) to switch to using it. I've still got some mismatches in my test suite and some timing-sensitive demos aren't quite right yet, but it's probably my contention calculations. The old CPU core used a different rounding alignment, and that might explain the extra tick seeming expensive.

kosarev commented 3 years ago

Maybe getting traces from https://github.com/kosarev/zx for those demos might be of some help here. Just in case it's an ZX emulator you are working on.

simonowen commented 3 years ago

It's for my SAM Coupé emulator (SimCoupe), which is like a cousin to the ZX Spectrum. It's currently using the Z80 core Ian Collier wrote for xz80, with support for memory and port contention added in, plus various bug fixes. It doesn't include all undocumented flag support, just the commonly needed cases, so it doesn't pass the stricter Z80 test suites. The code also uses a lot of macros and loose typing, so it seemed like it could be a lot of work to get it to pass modern static analysis tools cleanly. Your core looks clean and extensible, and should make unit testing much easier -- particularly via the Python bindings.

Your zx project was very useful in getting the initial access contention overrides hooked up quickly. I'm still using modified versions of my original memory and port contention look-up tables, but may change it to calculate them in-place. Where it applies, contention limits CPU access to 1 cycle in every 4 or 8 cycles. What I have running is close, but some of my timing tests fail and timing sensitive demos do not display correctly, so I'm still missing something.

I suspect there's an off-by-one in my logic somewhere that's causing it to sometimes go wrong. I don't currently have any "-1" offsets in any of my processing, which is slightly different to your zx project. My t=0 is the tick that the frame interrupt goes low (T3), and my existing contention is applied by rounding up with an OR of 3 or 7 (actually T4 but should probably be T3). There might be a different order of processing somewhere that hides it with the current core. I'll have to debug it in more detail over the weekend.