sorgelig / ZX_Spectrum-128K_MIST

ZX Spectrum 128K for MIST Board
GNU General Public License v2.0
31 stars 18 forks source link

70908 demo by Scoopex - blank screen after stars effect #35

Closed sirdavid2 closed 4 years ago

sirdavid2 commented 4 years ago

https://www.pouet.net/prod.php?which=80915 Doesn't go blank if "video timings" are set to Pentagon, but the timings are not correct then for the multicolor effects. The demo is written for ZX Spectrum 128k and I've tested it on +2 and +2A. It works well on V6Z80P also.

sorgelig commented 4 years ago

use this with pentagon timings: ftp://ftp.scene.org/pub/parties/2019/forever19/zx_spectrum_demo/70908_pentagon_fix.zip

sirdavid2 commented 4 years ago

But still, it doesn't fix "ULA-128" timings issue. There is a bug somewhere...

sorgelig commented 4 years ago

may be it's not exactly timings but may be some weird port use. So it may be related on non-standard address decoding or similar. If someone will debug the demo from Spectrum side and tell what is the specific issue, then i will try to fix. I'm not going to disassemble and debug this demo myself.

sirdavid2 commented 4 years ago

Ok, I'll try to debug it and find the problem.

sorgelig commented 4 years ago

it will be helpful!

sirdavid2 commented 4 years ago

My debuging is very close to the critical moment and for now it seems that the problem are timings. The code waits for something to start the next part and it never happens. Why timings? When I do a snapshot in emulator too close (in the same frame I think) to that moment, the snapshot behaves like MIST core - the next part doesn't start. I think that the snapshot (.z80) can't start in the same moment of the frame that it was stopped, so it points to timings. At least for now. I will try do find that condition when the demo should go further.

sorgelig commented 4 years ago

While this behavior shows the difference, from other side it's pretty much bad programming. I understand sometimes you need to do some tricks to squeeze max performance, but in this case it's clearly dumb behavior as it's simply transition from one scene to another. Could simply wait for next frame and problem wouldn't exist.

sirdavid2 commented 4 years ago

Waiting for transition to another scene simply by waiting for next frame is not enough when you have to be synchronized with music. I'm still debuging the critical moment, but I already know that there is a global frame counter, increased by the frame interrupt procedure, which is checked between the parts until it reaches specified value. And that value after the stars part is already higher than checked (by 1) when that checking begins, so the code loops almost forever (for 65535 frames). That means the overall speed of hardware is too low (and that's why it doesn't loop with Pentagon timings). The question is still why and I'm going to find the reason.

sirdavid2 commented 4 years ago

After hundreds of tests I still can't figure out what is causing the delay that holds the demo after the stars effect. I know that after this effect there are many calculations (decompression and others) which last many frames (more than 30). After that MIST core is late by about half a frame and misses the frame counter. I was trying to prepare a sample code to catch the difference between MIST and other platforms (real Spectrums and software emulators) but without success (yet). But today I decided to check how the demo works on older cores. And it works well! It works fine up to 180704, then 2 next cores (180804 and 180922) have something broken as the demo crashes right after the start. Since 190112 the demo doesn't crash, but stops after the stars (as on the newest core). The conclusion seems obvious - you have broken something after 180704 and it is not fixed yet.

sorgelig commented 4 years ago

According to log there were many fixes in T80. Actually now T80 passes all instruction tests including timings. So it's kind of confusing. It can be a bug hiding the other bug. And after fixing the one, the other start to affect.. I will test 180704

sorgelig commented 4 years ago

T80 version from 180704 copied to latest core makes this demo work. But it doesn't mean much. Simply older T80 had a lot of incorrect timings and can be simply faster than later more accurate version. May be it depends on fast/slow RAM page?

sirdavid2 commented 4 years ago

All the mentioned code runs in slow RAM (below 32768, close to 24000), so who knows... It seems I will have to keep digging.

sorgelig commented 4 years ago

it's not only about code execution, but data as well. Some pages are slow, some fast. Core should have correct slow/fast pages, but i could be wrong. Half frame difference is a lot.

sirdavid2 commented 4 years ago

This big difference is so strange, because I can't simulate any difference in my sample code. And all the multicolor effects in this demo look correct (but they run in fast pages). Slow/fast pages seem to be ok also. I've seen in the core code that all odd pages are slow and even are fast, just like in ZX 128k (+2). I think the core doesn't support +2A/B/+3 contention model, but it is not the point here.

sorgelig commented 4 years ago

Correct contention is only in 48K and 128K/+2 modes, this is right.

gyurco commented 4 years ago

The demo is broken by this commit: https://github.com/sorgelig/ZX_Spectrum-128K_MIST/commit/3676819f4c3417087e56e83b74bc30f531e1b308 Reverting the T80.vhd and T80pa.vhd parts fix it. But probably leave something wrong behind.

gyurco commented 4 years ago

This is the smallest amount of reverted code which I was able to do: https://github.com/gyurco/ZX_Spectrum-128K_MIST/commit/8bb2d070a48ec927217af5165fe0932b54e1751d Maybe an interrupt latency issue.

sorgelig commented 4 years ago

i'm investigating..

sorgelig commented 4 years ago

the issue comes from this line:

    A <= A_int when NoRead = '0' or Write = '1' else A_last;

to fix, change it to:

    A <= A_int when NoRead = '0' or Write = '1' else (others => '0');

for ZX contention model it's crucial. And i remember the correct way is to retain the last address on bus. By using 0 you change the contention to the one used in ROM.

So, i still think it's not correct fix. It may affect other demos.

gyurco commented 4 years ago

But why the idle address value matters? The ULA doesn't check for contention only when nMREQ or nIORQ asserted? (Also it ignores nRFRSH, that's the reason for the "snow" effect, which is not implemented currently).

gyurco commented 4 years ago

These conditions looks like negated for me: https://github.com/sorgelig/ZX_Spectrum-128K_MIST/blob/master/video.sv#L241 e.g. memconted = 1 when iorq_n = 1 and memrq23 also =1 ? (as it implies mreq_n is 1).

sorgelig commented 4 years ago

i don't remember exact details, but i remember mreq and iorq are shifted against each other when they affect the contention. It was written in ULA book, IIRC. Contention was carefully checked in tests. Otherwise it would seriously affect many demos and some games. Because of how contention work, it needs a stable address on the bus. Probably it's enough to keep the address as long as MREQ or IORQ are active. nRFSH is used in later ULA revisions (even in 48K version IIRC) to remove that snow effect. So, it's just a matter of ULA revision.

sorgelig commented 4 years ago

actually simple:

A <= A_int;

also works with this demo.. But still need to test many other demos, and timing tests.

gyurco commented 4 years ago

Do you remember what demo needed that A_last change? (Btw, the ULA doesn't have a nRFSH pin, thus it couldn't be fixed in a later revision. Only the +2A/+3 GA considers refresh).

sorgelig commented 4 years ago

unfortunately i don't remember.. Probably it was some specific timing test. Also it's better to refer some official or reverse engineered sources of Z80 about address in different stages.

sirdavid2 commented 4 years ago

In my tests of this demo I have made a modification that shows PC (address) of the main code when the interrupt comes, so I can tell if the speed is the same as on the real ZX (or proper software emulator). You could make a core with such modification to test and I can tell if the overall speed is correct with an accuracy of a few cycles. If it turns out to be the same, the fix should not affect other demos.

gyurco commented 4 years ago

Aquaplane (48k) and Shock Megademo(128k) are still perfect, so I think it's ok to revert the A_last change.

gyurco commented 4 years ago

Try this one: https://github.com/gyurco/ZX_Spectrum-128K_MIST/releases/tag/200550

sirdavid2 commented 4 years ago

This test core speed is exactly the same as 180704 and is a bit faster than the real HW. The difference is much smaller, after all the calculations the demo checks frame counter about 5% of frame too early (while the latest official core is about 40% of frame too late). Remember that the difference is due to a very specyfic code of this demo, very specific calculations with many rarely used commands.

sorgelig commented 4 years ago

The main problem in new version is that address bus doesn't follow original HW. As a side effect, now this demo works. Well, can keep this way.

sirdavid2 commented 4 years ago

I think I have just found it. At least a cause. When I register points to slow RAM, the code runs slower no matter if interrupts are on or off (I haven't known about it until now). With such value of I, core speed matches exactly the HW. And that's why some of my tests showed proper core speed. But when I points to fast RAM, the code should run noticeably faster (about 1000 clock cycles per frame). On MiST core the difference is way to small compared to the real HW. At list when we run the code of this demo.

gyurco commented 4 years ago

But "I" normally does not point to contended memory (as it would cause the snow effect), thus if the speed is wrong in this case, then almost everything would be wrong. Also I wonder why it should slow down the CPU, "I" register only affects the refresh address, which doesn't turn on contention in ULA (only the RAM access will scrambled).

sirdavid2 commented 4 years ago

In this demo I points to fast RAM, where the speed sometimes is not as it should be. It has to be contention, as with slow RAM the speed is exactly accurate, so T80 should be ok. At least I know why my tests showed that the core speed is accurete, my own tests had I pointing to slow RAM. Edit: I have just thought that if I points to fast RAM and the code is also in fast RAM, then core speed may be also correct! And that's why all the multicolor demos work properly! I have to check it tomorrow. The only incorrect situation may be when I points to fast RAM and code is in slow RAM. I register value doesn't affect only +2A/B/+3.

sirdavid2 commented 4 years ago

It turned out I was right. I have prepared 4 taps with test code (using 70908 code) for every possible combination: https://www.dropbox.com/s/jdtj6us78u5xw84/ZX_core_speedtest.zip?dl=0 Test code waits for an interrupt, calls one calculation procedure form the demo (which lasts a bit less than a frame). After the return it counts how far is the end of frame (ld hl,0; inc hl; jr $-1) and displays the result. The results are:

  1. "I" in slow mem, code in slow mem - the core speed is OK.
  2. "I" in fast mem, code in fast mem - the core speed is OK - that's why most of the demos/games work ok.
  3. "I" in fast mem, code in slow mem - the core is too slow - this is 70908 demo case when it's doing heavy calculations after the stars effect. That's why it does not make it before the frame counter check.
  4. "I" in slow mem, code in fast mem - this time the core is too fast! - rather not used on ZX 128k because of snow effect. But also buggy anyway. I have tried my tests also on software emulators: Fuse and ZX Spin and both work exactly like the real HW.
sorgelig commented 4 years ago

I will finish with SAM Coupe first. Then will check your test and see if i can fix it.

sorgelig commented 4 years ago

the tests above are uninformative.. I don't know what value should be.

I register basically has no direct relation to speed. It just holds upper byte of address. If it points to slow memory then interrupt will be serviced slowly. If points to fast then faster. It's possible there is difference in contention when slow/fast access is interleaved. Probably instead of register I, you can achieve similar difference if code in fast mem will read the data from slow mem or vice versa.

gyurco commented 4 years ago

"I" shoud be the upper part of the refresh address, too (R is the lower 8 bits). But that shouldn't affect the speed also, just cause the snowing (if it points to slow RAM).

sirdavid2 commented 4 years ago

I don't know how, but the value of "I" affects the speed more than we could have thought, and that's a fact. Even when the interrupts are turned off. I have talked to Mat/ESI about it, he has made his own tests and confirmed my observations. Not the value itself matters, but where it points, on fast or slow RAM, or ROM which is the same as fast RAM. My tests just show that difference. I already explained what the values mean - how many time left until the end of frame after the return from one demo's procedure. So the bigger is value, the faster is code. These values don't mean anything special, every test should just show the same value on the real HW and "emulator". This is just for comparsion. If it differs, then something is wrong. When the value is too big, then "emulator" is too fast and vice versa. Every test should show different value, test 1 should be different from test 2 and so on. Edit: "I" doesn't affect speed of non-ULA Spectrums, so hi-hi test shows the same value as low-hi on +2A/B/+3.

sirdavid2 commented 4 years ago

I found the better way to show what's going on: https://www.dropbox.com/s/pts2l2ihglnmskn/ZX_core_speedtest2.zip?dl=0 Again, there are 4 tests for 4 cases, but this time you can see 4 lines on screen. If the line on border matches one line on screen, then the core speed is accurate. And it is accurate only in tests low-low and high-high. These tests are better because now you can see what are the speed differences between every case, how much the value of "I" affects the speed.

sirdavid2 commented 4 years ago

I have made some photos. This is how it looks with me: https://www.dropbox.com/s/mvf12tfe6j7sj7n/ZX128_speedtest.zip?dl=0 Real HW is on the left. Holes are caused by rain effect. I have also added "one for all" version, you have to choose where is the code and "I" register. Now you can see that even in fast-fast combination there is a (small) difference - the line on +2 begins one char earlier. Differences in slow-fast and fast-slow cases are big and very clearly visible.

Edit: I have just found out that the difference in fast-fast is because of "late timings". So the core is ok in fast-fast and slow-slow, but not good at all in the rest.

gyurco commented 4 years ago

In this page, there's a list with expected address bus contents broke down to machine cycles: https://worldofspectrum.org/faq/reference/48kreference.htm The last_addr patch surely fix LDI(R) and LDD(R), however probably breaks something else.

sorgelig commented 4 years ago

Originally T80 didn't care what is on the address. So it's not that last_addr breaks something as it's impossible to break what wasn't there originally. T80 besides of 99% compatibility, still a functional (or better say clear-room) re-implementation than based on decapped schematics.

It may also different in sub-cycles of contention when fast/slow memories are switched.

gyurco commented 4 years ago

I know it wasn't perfect, but you made it very close. It is possible that something on the bus was accidentally OK.

sorgelig commented 4 years ago

Demo isn't depend on specific time but rather needs to finish that part before a specific frame. It can be one frame before of 10, or whatever as soon as it's not later than the point. So whatever correct or not correct timings are emulated it's fine as it's quick enough. So, saying "i've changed this code and demo has been fixed" is not really correct unless it's real fix of timings.

gyurco commented 4 years ago

I know that the revert "fix" is wrong, e.g. it surely breaks LDI(R) and LD(D)R, and many others. Just have to find the instruction which doesn't need the last_addr patch. Or the problem is at a completely different place, who knows. And since with the revert, LDIR doesn't contend, it runs faster (the demo makes a long loop with it). I admit that's not correct.

gyurco commented 4 years ago

@sirdavid2 Maybe your tests can pinpoint where's the error, but an assembly list would help a lot. With comments - even better.

gyurco commented 4 years ago

Hmm, as I see, the tests which shows the most difference is low/hi - mostly tests LDIR in two situtations:

gyurco commented 4 years ago

Looks like a rather long instruction cycle (fully contented - M1, read, write) LDIR (one iteration) ldir-full-contention

sirdavid2 commented 4 years ago

My tests are based on the demo code, so I don't have the sources. But I can try to make my own code, as simple as possible, which would show the difference between real hardware and the core.

gyurco commented 4 years ago

As I wrote above, the most interesting test case would be a fully contended LDIR. This will run too fast on the current core, and maybe slightly slower on the previous. However single-stepping in FUSE might pinpoint where the slowness comes from.