gyurco / Next186

GNU General Public License v3.0
13 stars 10 forks source link

graphical issues in some early games #14

Closed ghost closed 2 years ago

ghost commented 2 years ago

HI! I'm new user.

Early games like PC-MAN, Jbird... have graphical issues. I discuss with @spark2k06. He confirms that these games works on Zxuno core. It seems that CGA is not fully implemented on mist core. POPCORN game doesn't work at all.

Thank you for your great work.

gyurco commented 2 years ago

What's the problem with J-Bird? It's good for me.

gyurco commented 2 years ago

BTW, CGA is not very complicated, it's graphic modes are fully implemented as far as a VGA card would implement it. No real 6845 of course, but it's very seldom needed (probably only for demos).

gyurco commented 2 years ago

Popcorn also starts, but very slow. Not a CGA issue. If I invest time in developing the core, please invest more time in the bugreport. Upd.: Popspeed can be used to set the speed. The game runs well. Why do you say "doesn't work at all"?

ghost commented 2 years ago

Yes you were right.

I had bad copies for J-Bird and Popcorn. I download better copies and now it works.

Striker ( helicopter game ) runs way too fast and points seem to replace text on screen as same as Pc-man. Burgertime crashes at start. "Rollo and the Brush bros" runs faster than it should. I tried all configuration in the OSD.

From my point of view 186 is needed for very old games from early 80's.

Sure if you want me to do some tests on these old games I could do that.

I don't know anything about FPGA programming.

That's why I didn't want to bother you at first. I have contacted the original author of the core.

Thank you again for the port. I know it demands a lot of work.

Regards, Adrien.

Le mar. 5 avr. 2022 à 11:40, gyurco @.***> a écrit :

Popcorn also starts, but very slow. Not a CGA issue. If I invest time in developing the core, please invest more time in the bugreport.

— Reply to this email directly, view it on GitHub https://github.com/gyurco/Next186/issues/14#issuecomment-1088489054, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYSB3I7UXYYXSR54NSKMPGTVDQDBXANCNFSM5SSBB5HQ . You are receiving this because you authored the thread.Message ID: @.***>

gyurco commented 2 years ago

Original author is Nicolae Dumitrache, and he didn't implement CGA at all. I implemented it in this fork, so it's my original work (with my original bugs...). I did not copy it from ZX-UNO, where the VGA was fully replaced by another CGA implementation, instead I've used the original registers which supposed to work with the CGA modes on the original IBM VGA card. The BIOS missing some functions which draws text on graphic screens, but you can observe it in VGA screens, too (Gateway Corporation for example). CGA games are using it more often, like in Sopwith, Paratrooper, etc... But it's not a hardware issue. It would be good if someone with X86 assembly knowledge could add these functions to the BIOS.

ghost commented 2 years ago

Yes I wanted to say Haitor Gómez García for Zxuno port not original author sorry.

Ok thank you for the informations.

Le mar. 5 avr. 2022 à 22:07, gyurco @.***> a écrit :

Original author is Nicolae Dumitrache, and he didn't implement CGA at all. I implemented it in this fork. The BIOS missing some functions which draws text on graphic screens, but you can observe it in VGA screens, too (Gateway Corporation for example). CGA games are using it more often, like in Sopwith, Paratrooper, etc... But it's not a hardware issue.

— Reply to this email directly, view it on GitHub https://github.com/gyurco/Next186/issues/14#issuecomment-1089276020, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYSB3IZBV6ALLK2XAY52HQLVDSMRVANCNFSM5SSBB5HQ . You are receiving this because you authored the thread.Message ID: @.***>

spark2k06 commented 2 years ago

The only problem I have observed with CGA in the version based on Graphics Gremlin has been the video glitches that occur in certain games that make use of certain hardware-based screen scrolls, such as some dinamic games: Freddy Hardest, Capitan Trueno, Army Moves, etc...

This is due to the cache memory of the Next186, and it also happens in MiSTer with the implementation they do of CGA as you do in your version, @gyurco. I know for sure it's because of the cache because in AO486 core of MiSTer they have the option to disable it, and those glitches disappear. If you manage to implement a cache disable option, you can also fix it... I still haven't been able to get rid of the cache in the ZXUno version.

gyurco commented 2 years ago

Did you actually test my version? I've made changes in the cache towards proper display RAM-SDRAM coherency, so there should be no glitches due to dirty cache lines. Defender of the Crown fencing scene or Golden Axe are using the address register based scrolling, they look good.

spark2k06 commented 2 years ago

I don't have a MiST. Golden Axe is also no problem on the CGA version of ZXUno, DOC neither... if you have the chance, try for example Army Moves, the glitches there are more than obvious:

image

gyurco commented 2 years ago

Maybe OP can try it. I wonder why the video RAM is cached on MiSTer, as I see, it's fully BRAM-based.

ghost commented 2 years ago

In the case of PC-MAN game, it works great on Mister ao486 with cache « L1 » and « L2 » enabled. No glitches and no text problems. However in the case of this game timing seems better on gyurco’s core.

Le mer. 6 avr. 2022 à 07:55, Aitor Gómez @.***> a écrit :

The only problem I have observed with CGA in the version based on Graphics Gremlin has been the video glitches that occur in certain games that make use of certain hardware-based screen scrolls, such as some dinamic games: Freddy Hardest, Capitan Trueno, Army Moves, etc...

This is due to the cache memory of the Next186, and it also happens in MiSTer with the implementation they do of CGA as you do in your version, @gyurco https://github.com/gyurco. I know for sure it's because of the cache because in MiSTer they have the option to disable it, and those glitches disappear. If you manage to implement a cache disable option, you can also fix it... I still haven't been able to get rid of the cache in the ZXUno version.

— Reply to this email directly, view it on GitHub https://github.com/gyurco/Next186/issues/14#issuecomment-1089851147, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYSB3I76FGUBQOJRDG4RKOTVDURMHANCNFSM5SSBB5HQ . You are receiving this because you authored the thread.Message ID: @.***>

gyurco commented 2 years ago

I don't think Army Moves glitch is related to the RAM cache - it's even broken when no movement, already wrong data is written to the video RAM. It's more likely a CPU issue. Maybe a bug, but I'm suspecting the instruction queue (in the BIU code) - if the drawing code is using self-modifying code, then it would easily fail. Would need some debugging.

spark2k06 commented 2 years ago

But it is curious that in the AO486 kernel of MiSTer this also happens, and that it is solved by disabling the cache. Specifically, if the L1 cache is disabled, the glitches disappear.

gyurco commented 2 years ago

I don't know the internals of the cache implementation, but l1cache looks like a module of icache on ao486, thus it's like a similar instruction cache as the instruction queue.

spark2k06 commented 2 years ago

Now that you say that makes sense, I don't know how difficult it would be to have an option to disable this instruction queue in Next186, but it could be a partial solution... just as it is for MiSTer's ao486.

gyurco commented 2 years ago

I don't know - but BIU is not much code, only around 200 lines.

gyurco commented 2 years ago

Partial success...I've added FFLUSH_REQ to a couple of instructions, and Army Moves looks good now.

                                WR = MREQ & !FETCH[0][1];
                                WE = WR | IRQ ? 5'b00000 : &FETCH[0][2:1] ? {2'b00, FETCH[1][4:3] != 2'b01, 2'b00} : {3'b000, WBIT};            // RSSEL, RASEL_HI/RASEL_LO
                                ISIZE = IRQ ? 0 : ISIZES;
+                               FFLUSH_REQ = 1;^M
                        end
 // --------------------------------  mov IMM to R/M  --------------------------------
                        1: begin        
@@ -427,6 +428,7 @@ module Next186_CPU(
                                WR = MREQ;
                                WE[1:0] = WRBIT;                // RASEL_HI/RASEL_LO
                                ISIZE = ISIZEW;
+                               FFLUSH_REQ = 1;^M
                        end
 // --------------------------------  mov IMM to R --------------------------------
                        2: begin        
@@ -703,6 +705,7 @@ module Next186_CPU(
                                        WR = MREQ;
                                        WE[1:0] = WRBIT;                // RASEL_HI, RASEL_LO
                                end
+                               FFLUSH_REQ = 1;^M
                        end
 // --------------------------------  in --------------------------------
                        15:     begin
@@ -813,6 +816,7 @@ module Next186_CPU(
                                        WE = {3'b100, WR | &FETCH[0][5:3] | FETCH[0][2] ? 2'b00 : WBIT};                // flags, RASEL_HI, RASEL_LO
                                        ISIZE = ISIZES;
                                end
+                               FFLUSH_REQ = 1;^M
                        end
 // --------------------------------  add, or, adc, sbb, and, sub, xor, cmp R/M with Imm --------------------------------
                        24: begin
@@ -833,6 +837,7 @@ module Next186_CPU(
                                        WE = {3'b100, WR  | &FETCH[1][5:3]? 2'b00 : WBIT};              // flags, RASEL_HI, RASEL_LO
                                        ISIZE = ISIZEI;
                                end
+                               FFLUSH_REQ = 1;^M
                        end
 // --------------------------------  add, or, adc, sbb, and, sub, xor, cmp, test Acc with Imm --------------------------------
                        25: begin // test
@@ -844,6 +849,7 @@ module Next186_CPU(
                                ISEL = 2'b00;
                                MREQ = 1'b0;
                                ISIZE = WORD ? 3 : 2;
+                               FFLUSH_REQ = 1;^M
                        end
 // --------------------------------  inc/dec R16 --------------------------------
                        26: begin

It has a performance penalty of course, but now there's a proof that the instruction queue doesn't work with self modifying code. There would be really nice if it could be disabled.

gyurco commented 2 years ago

This one is enough for Army Moves:

--- a/rtl/Next186/Next186_CPU.v
+++ b/rtl/Next186/Next186_CPU.v
@@ -415,6 +415,7 @@ module Next186_CPU(
                                WR = MREQ & !FETCH[0][1];
                                WE = WR | IRQ ? 5'b00000 : &FETCH[0][2:1] ? {2'b00, FETCH[1][4:3] != 2'b01, 2'b00} : {3'b000, WBIT};            // RSSEL, RASEL_HI/RASEL_LO
                                ISIZE = IRQ ? 0 : ISIZES;
+                               FFLUSH_REQ = WR;
                        end
 // --------------------------------  mov IMM to R/M  --------------------------------
                        1: begin 

Interestingly, the "mov mem to/from ACC" case is already covered.

spark2k06 commented 2 years ago

Great discovery, thank you very much! Yes, this FFLUSH_REQ solution was implemented by me for my CGA version of ZXUno, and I added it to the "mov mem to/from ACC" case :

https://github.com/spark2k06/next186lite_graphics_gremlin/commit/5a788e6ae7cf5c2bd79ff916633face4fd00f645

The reason at the time was that programs using EXEPACK used precisely that kind of instructions to automodify code, and I saw that it didn't work well with the instruction cache. As I never managed to get rid of that cache, I developed that partial solution of FFLUSH_REQ, and now it has come in handy for this case, and I'm glad about it :-). I was tipped off by misterfpga forum user pgimeno:

https://misterfpga.org/viewtopic.php?p=38630#p38630

Given the complexity of the change they made for MiSTer, I applied a different solution, which is the one I have mentioned.

For me the ideal is to remove that cache completely, please let me know if you can figure out how to do it, I don't mind losing performance for the CGA version of ZXUno.

In the meantime, I'm going to apply this fix to my version, it will not only solve the Army Moves case, but others from Dinamic that use the same programming method: Capitan Trueno, Freddy Hardest, and probably others.

By the way, I never imagined that the reason why these games had these glitches was precisely because of this instruction queue and the fact that it had self-modifying code, I blamed it directly on the cache and my goal was to remove it completely... which I was never able to do. But this partial solution for my version is also very good.

Fixed for ZXUno:

https://github.com/spark2k06/next186lite_graphics_gremlin/commit/41079186a17b9f8e716fa46f2c5c8f8c10a9b30f

Also, I noticed that you have added Tandy sound and several improvements in JTOPL. I will also take advantage of this feature for my CGA version (along with the JTOPL improvements), I'm very interested in it, as long as it fits in ZXUno, which is already very tight on space. I would gain a lot of space by completely removing the data and code caches (L1 and L2), and even more by replacing Next186 with MCL86, but both are very complex tasks that require a lot of analysis and debugging time, maybe someday.

gyurco commented 2 years ago

Ah, so the existing FFLUSH_REQ wasn't there originally. However I think it's enough to activate it for the write case then (FFLUSH_REQ=WR).

I also added Tandy video, but partially, as it should have bank switching in the video RAM (PCJr/Tandy 1000 has 128KB video RAM), and I don't know yet how to merge it with the original core's segment handling.

spark2k06 commented 2 years ago

Yes, I have thought about putting WR instead of 1, I'm sure it will work. I'll make the change in the next iteration. Thanks!

spark2k06 commented 2 years ago

Another game I have identified for review is PAKU PAKU:

https://deathshadow.com/pakuPaku

Its original screen looks like this:

pakuMenu

However, in the CGA version of ZXUno's Next186, the pixels seem to be doubled, and the full screen of the game is not displayed:

ZXUno Next186 CGA

This is not the case with the Graphics Gremlin card on which it is based. If this card is connected to a real PC, the screen displays fine. Which leads me to think that something in the core is wrong. Interestingly, on MiSTer's ao486 core, although the full screen is visible, graphical glitches are also visible:

MiSTer AO486

I wonder what it would look like on MiST's Next186, can you confirm that this glitch also happens... if so, can you think of a reason?

gyurco commented 2 years ago

AFAIK the video mode here is a hack of the text mode, where the character height is set to 2. It's currently hardcoded to 8 in Next186. Your ZX-UNO version probably has an internal scandoubler for VGA displays?

gyurco commented 2 years ago

Yepp, it's just the matter of implementing the programmable character row counter. paku

I have to check if it possible to make use the linedoubler, too, then it'll be full screen.

gyurco commented 2 years ago

Unfortunately it doesn't turn on the scandoubler bit (reg[9][7] in CRTC). Should it work on a real VGA? Or is it CGA only?

ghost commented 2 years ago

I don’t know if it could help but classicdosgames.com shows System Requirements :

4.77 MHz or faster 8088 128K of RAM (66,560 bytes free in DOS) DOS 2.11 or higher CGA, Tandy/PcJr, EGA or VGA video

Le ven. 8 avr. 2022 à 22:55, gyurco @.***> a écrit :

Unfortunately it doesn't turn on the scandoubler bit (reg[9][7] in CRTC). Should it work on a real VGA? Or is it CGA only?

— Reply to this email directly, view it on GitHub https://github.com/gyurco/Next186/issues/14#issuecomment-1093355793, or unsubscribe https://github.com/notifications/unsubscribe-auth/AYSB3I4BBL6XADYIA7KUTWTVECMKXANCNFSM5SSBB5HQ . You are receiving this because you authored the thread.Message ID: @.***>

gyurco commented 2 years ago

Luckily the source code is available.

It calls the BIOS function $12 to set the text mode to 200 scanlines:

    BL = 30  select scan lines for alphanumeric modes
       AL = 0  200 scan lines
          = 1  350 scan lines
          = 2  400 scan lines

    on return:
    AL = 12

It's not implemented in the BIOS, and also there's no built-in font for 200 lines alpha mode. I have no real intention to implement it, too, thus a coder is welcome to do it. It doesn't need FPGA skills.

spark2k06 commented 2 years ago

In MiST, is this image as originally displayed, or has it been after implementing the programmable character row counter?

Yepp, it's just the matter of implementing the programmable character row counter. paku

If so, where exactly did you make the change? Where did you implement it? Because it already looks much better than it does in my version of CGA, at least it is playable.

spark2k06 commented 2 years ago

Reviewing the game code, I see that a number of changes are made to CGA port $3D4:

    CGA_Regs:regData=(
        register:$3D4;
        data:#04+#$7F + #06+#$64 + #07+#$70 + #09+#$01
    );

Which is the equivalent of:

    mov dx, 3d4 ; select CRTC registers
    mov ax, 7F04
    out dx, ax ; set CRTC 04 to 7F
    mov ax, 6406
    out dx, ax ; set CRTC 06 to 64
    mov ax, 7007
    out dx, ax ; set CRTC 07 to 70
    mov ax, 0109
    out dx, ax ; set CRTC 09 to 01

I wonder if it is exactly this sequence that indicates that the character height should be 2 pixels... so, from here, I can check why the change is not happening in my version of CGA.

spark2k06 commented 2 years ago

I have hacked into the paku.exe executable, to replace these OUT DX, AX by NOPs... I just checked with PCEM and a CGA card that it is so:

image

So, the problem in my version is that it is ignoring this CRTC registry change, I will now try to find out why.

spark2k06 commented 2 years ago

it seems that it is because the register is being accessed in 16-bit mode with OUT DX, AX, and I don't have it contemplated in cga module:

image

I will try to solve it.

spark2k06 commented 2 years ago

Now I access the bus in 16-bit mode, it has improved but now part of the top screen is not visible:

image

I keep looking...

gyurco commented 2 years ago

mov ax, 0109 out dx, ax ; set CRTC 09 to 01

This sets the char height to 2 pixels. I just pushed my changes. Originally I wanted to finish the Tandy mode before I do that... But here's the relevant change for the CRTC reg 09: https://github.com/gyurco/Next186/commit/b36e084857291a4cefdef34b088ca46ba007b4fc

spark2k06 commented 2 years ago

Finally, I got it for CGA

image

I will soon update my version of CGA as well. Thank you very much for the tips provided!

spark2k06 commented 2 years ago

Tired of dealing with Next186's problems 😃:

https://twitter.com/spark2k06/status/1517806776472805377?t=qXEZI_uvNMMoM-XeYntsAA&s=19

gyurco commented 2 years ago

Instead you should do a 8237 DMA controller, and a Sound Blaster DSP :)

spark2k06 commented 2 years ago

Yes, well... we'll start by adapting it to what we can take advantage of the Next186, and from there, yes, we'll have to improve it... but the first tests I'm doing are proving to be a marvellous adaptation, everything makes more sense in terms of buses and signals.

gyurco commented 2 years ago

I have "only" one issue with MCL86: it's very inefficient, as it requires ~100 MHz clock for achieving ~4.7MHz 8086 performance. It cannot be speed up too much. Even a "Turbo XT" cannot be really achieved.

spark2k06 commented 2 years ago

Up to 3.4x times the performance of an 8088 at 4.77Mhz seems significant to me.:

https://youtu.be/Nr0nVI584PU

And according to this other link, it is possible to achieve the performance of an IBM AT at 8Mhz:

https://microcorelabs.wordpress.com/2020/11/12/ibm-pcjr-accelerator-mcl86jr/

We'll see how it behaves when I finish fixing some problems, however, in the Graphics Gremlin based project I'm not aiming for more speed. Besides, for ZXUno it's a great advantage that it only takes ~300 LUTs.

gyurco commented 2 years ago
Up to 3.4x times the performance of an 8088 at 4.77Mhz seems significant to me.:
Cycle compatibility is disabled and 128KB of RAM is instantiated inside of the FPGA which is accessed at the core speed of 100Mhz.

It's because of fast RAM. BRAM and CPU at 100 MHz, and only 3.4x faster than an XT - well, not impressed.

spark2k06 commented 2 years ago

Even so, in this core for me the important thing is cycle accuracy, 8088 behaviour and the absence of cache.

For speed and VGA features, we would already have Next186 and ao486 on MiSTer.

spark2k06 commented 2 years ago

Luckily, we have a DMA module available in the ao486 repository, as well as others like IDE that we can surely take advantage of. These will allow us to load a .VHD image from the OSD itself with the help of Universal XTIDE BIOS, and hopefully use the original IBM PCXT BIOS or a cloned version:

https://github.com/MiSTer-devel/ao486_MiSTer/tree/master/rtl/soc

I'm still having problems with the keyboard, I'm using the same keyboard module that next186 uses, but I've already prepared a MiSTer version and I've added a Splash screen:

https://twitter.com/spark2k06/status/1521851295065333760?t=72iY31KJgw21tzte22JuWA&s=19

gyurco commented 2 years ago

Using the original BIOS would be a good achievement - even better if the original Tandy 1000 BIOS could be used.

gyurco commented 2 years ago

I would say IDE or SD Card SPI is not something which will make much difference - only if ATAPI is also supported, then CD-ROM hardware can be added.

spark2k06 commented 2 years ago

Even with issues, I have made it public:

https://github.com/spark2k06/PCXT_MiSTer

Mainly, that hardware interruptions are not thrown. Several modules are inherited from Next186, and many things may not make sense in this project, so some cleanup will have to be done.

Any help is welcome ;-)

spark2k06 commented 2 years ago

I have downgraded the module version to the original version as well as the INT08 call in the much more basic Next186 project BIOS, and the IRQs are already running. However, it now seems to hang often during the execution of these IRQs. It's time to investigate why, but in the meantime, I've already updated the sources in the GitHub repository with these changes:

https://github.com/spark2k06/PCXT_MiSTer/commit/7b54d1ecda4d444d06a6dd9b0ed622a27c26eccd

I will not SPAM any more about this core in this thread :-)

gyurco commented 2 years ago

If you discover a bug, I'm happy to hear about it. I just added the disc motor counter decreasing to the Int 08 handler, it was required by some games.

gyurco commented 2 years ago

The Next186 modules are not handling CS, WR, etc. in an edge triggered way, but expects these signals as a single cycle pulse. If a CS & WR lasts for more cycles, it might have unwanted consequences (for example, the PIC acks all interrupts, not just the last one in service).

spark2k06 commented 2 years ago

This makes a lot of sense, I will take it into account in my next code review.