Open decod81 opened 3 years ago
Thanks for investigating, I'm aware of the core is not perfect. However it's just a straight port of the original, which is already 7 years old, and nobody besides the original author bothered with fixing anything in it (even if it's ported to several other boards). So it's a bit orphaned.
I investigated a bit more and found a fix. Here's a code which improves cache coherency by modifications in ddr_186.v:
REPLACE this: .flush(auto_flush == 3'b110) WITH THIS: .flush((auto_flush == 3'b110)|(auto_flush_v == 3'b110))
REPLACE THIS: auto_flush[1:0] <= {auto_flush[0], vblnk}; WITH THESE: auto_flush[1:0] <= {auto_flush[0], hblnk}; auto_flush_v[1:0] <= {auto_flush_v[0], vblnk};
ADD SOMEWHERE: reg [2:0]auto_flush_v = 3'b000;
This gets rid of 99% of the funny behavior by flushing the cache during each horizontal sync as well as vertical sync. The behavior now matches real hardware much more closely and makes games look nicer.
Some minor issue still remains which is that the line test still shows that vertical sync location somehow has one pixel difference compared to real hardware. Couldn't find a fix for that yet, but might look again for it later if I have time.
Nice fix decod81! It's very noticeable in Monkey Island 2 when the monkeys come onto dance in the introduction. This fix cleaned it up for me.
If this fix works, then the question is why the games are writing to the VRAM during active display? Or is it the "proper" fix?
I think many games just utilized the most compatible mode 13h which didn't use pages and some video cards didn't even have more memory than just enough for one page, so they were basically just "racing the beam" with vsync. In some cases you also want to save vmem for other purposes. Of course some software may have followed poor programming practices or done intentional speed tricks. Some demos sync with hblnk to create copper demos so these require the fix. However, in general I think all this was normal practice back in the day and as such perfectly expected behavior and the fix also being logical.
One may clean the code a bit by just flushing when hblnk changes alone as that appears to be sufficient and instead of array of 3, use array of 2.
There is still some minor cache coherence or timing issue remaining which in most cases doesn't appear at all so let's say the fix is 99%. Most people probably won't notice problems after applying this fix. If I run into the 100% fix, I will let you know.
Ok, then I'll add it. Maybe you can also find out why Golden Axe hangs on the title screen, why Lotus III's speed is erratic (sometimes the music is slow, sometimes insanely fast, and rarely normal), and some other little issues. Just would be enough to find out if it's the partially implemented PIC, PIT, the CPU, or something completely different.
As I see, auto_flush[2] is only set by writing to AUX (port 0001 - by BIOS). So using auto_flush_v == 3'b110 will always false, thus the patch effectively moving the auto flush from vblanks to hblanks only. Here's the code at BIOS init code which enables auto flush on vblank: https://github.com/gyurco/Next186/blob/master/186Code/BIOS/BIOS_Next186.asm#L397
Hi, thank you for this amazing core! BUG 1 is much more evident in Double Dragon 3 (only when enemies are on screen), while maybe BUG 2 makes Catacomb 3-D games appear at half screen. I also really hope to see Golden Axe running properly, one of my fav! Please keep it up! Thanks so much ;D
Pushed the commit which moves auto flushing to hblank instead of vblank. Also pushed some timer fixes, which makes Lotus 3 work (at least).
Awesome! Thank you!!! Can't wait to test it on my SIDI!
@decod81
There is still some minor cache coherence or timing issue remaining which in most cases doesn't appear at all so let's say the fix is 99%. Most people probably won't notice problems after applying this fix. If I run into the 100% fix, I will let you know.
I think the 100% fix would be to make the VGA read its data through the cache controller, then flushing could be avoided at all. A less elegant fix would be to trigger the flushing from the cache, when it detects cached VGA data.
Nice fix decod81! It's very noticeable in Monkey Island 2 when the monkeys come onto dance in the introduction. This fix cleaned it up for me.
@squidrpi were you able to get Monkey Island 1 to work? It freezes at start for me. Thanks
@gyurco
I think the 100% fix would be to make the VGA read its data through the cache controller, then flushing could be avoided at all. A less elegant fix would be to trigger the flushing from the cache, when it detects cached VGA data.
Yes, you are probably correct. The issue as far as I can see is that vga fifo reads data too early respect to vblnk. It should fill the fifo near the end of vblnk instead of beginning. This way software has time to finnish drawing. Combined copper and vsync test program (https://github.com/decod81/Next186_MIST/tree/main/TEST) shows that topmost line of pixels is one frame behind the second line of pixels (attached pics). This is obviously not a problem if the software does double buffering or begins drawing earlier, but in some cases it might result in small visible artifacts on the first line of pixels.
I noticed that if the verilog comments are correct, vga fifo is filled directly from sdram 32 bytes at a time whereas cache is flushed 256 bytes at a time. Reading vga data through cache controller just in a naive way might thus have negative performance impact compared to some other way, but I don't know. Comparing one cache flush per vsync to one cache flush per hsync seems to according to landmark test have drop the effective CPU frequency from 139 MHz to 116 MHz and video from 52000 chr/ms to 32000 chr/ms. Obviously mostly correctly displayed graphics are preferred, but one data point there as well.
I tried to modify the code such that filling of the fifo would wait until a particular invisible scanline near the end of vblnk before reading the first visible scanline, but somehow this crashed the whole core so I must have done something wrong and didn't have time to iterate enough to fix it.
I also noticed a small issue related to the copper effect (which changes palette during hsync) which is that when doing "auto_flush == 3'b110", the very end of the copper-line is somehow screwed (seen in the attachment), a timing issue related to cache flush no doubt. This seems to be fixed when doing "auto_flush==2'b01" (which doesn't exactly make sense, but works none the less). However, in some cases like the starting screen of wolf3d, this slows down palette animation a bit, but maybe this should be so. I did not notice any impact on other games so far. This is very minor thing and probably limited mainly to some scene demos that do copper effects.
BTW. @alessioscand monkey Island 1 works for me. I didn't do anything to fix it. It just worked so can't comment anything more on that.
Latest fixes don't appear to cause any problems with games for me. Image centring looks perfect now on my monitor. Freezing with games still the same for me on Monkey 1, Civilisation, Underworld 2.
"Secret of Monkey Island, The (VGA) (1990)(Lucasfilm Games LLC) [Adventure]" from Total DOS Collection release 13 starts OK but I can't get any adlib audio. The other version I've tried with a blank screen works in a DOS emulator but starts with some cracked copy protection screen so perhaps it's using some funny video mode. I can send it if that highlights the other blank screen issues.
UPDATE: Weird, adlib started working on Monkey 1.
I have the un-cracked italian version (VGA 256, floppy) and I get a black screen at the beginning. This with the first release of the core (so without the fixes).
Here's the rbf with these latest fixes. Next186_MiST.rbf.zip
I've created a new issue #2 for discussing the freeze issue as it's a different problem to the OP.
I think the 100% fix would be to make the VGA read its data through the cache controller, then flushing could be avoided at all. A less elegant fix would be to trigger the flushing from the cache, when it detects cached VGA data.
Yes, you are probably correct. The issue as far as I can see is that vga fifo reads data too early respect to vblnk. It should fill the fifo near the end of vblnk instead of beginning. This way software has time to finnish drawing. Combined copper and vsync test program (https://github.com/decod81/Next186_MIST/tree/main/TEST) shows that topmost line of pixels is one frame behind the second line of pixels (attached pics). This is obviously not a problem if the software does double buffering or begins drawing earlier, but in some cases it might result in small visible artifacts on the first line of pixels.
Maybe starting the FIFO filling should start 1-2 lines before the active area. I think it can be done.
I also noticed a small issue related to the copper effect (which changes palette during hsync) which is that when doing "auto_flush == 3'b110", the very end of the copper-line is somehow screwed (seen in the attachment), a timing issue related to cache flush no doubt. This seems to be fixed when doing "auto_flush==2'b01" (which doesn't exactly make sense, but works none the less). However, in some cases like the starting screen of wolf3d, this slows down palette animation a bit, but maybe this should be so. I did not notice any impact on other games so far. This is very minor thing and probably limited mainly to some scene demos that do copper effects.
You can use even 3'b101, as the auto_flush[2] is set by the CPU (by a specific OUT in the BIOS). Maybe "10" is a bit early? As it's just when hblank starts. A longer auto_flush shift register can be used to delay it a little.
UPDATE: Weird, adlib started working on Monkey 1.
Monkey Island has audio problems on "fast" CPUs also on real hardware. Try the core option to reduce the CPU speed /3.
@gyurco
I found a fix for the remaining 1-pixel vsync/vgafifo issue now (along with the end of scanline copper issue). Here's what was needed:
// .flush(auto_flush == 3'b110) .flush(auto_flush == 3'b101)
// reg nop; reg nop; reg fifo_fill = 1;
// sdraddr <= BIOS_WR ? BIOS_BASE + (BIOS_ADDR >> 1) : s_prog_empty || !(s_ddr_wr || s_ddr_rd) ? {6'b000001, vga_ddr_row_col + vga_lnbytecount} : {memmap_mux[8:0], cache_hi_addr[9:0], 4'b0000}; sdraddr <= BIOS_WR ? BIOS_BASE + (BIOS_ADDR >> 1) : (s_prog_empty && fifo_fill) || !(s_ddr_wr || s_ddr_rd) ? {6'b000001, vga_ddr_row_col + vga_lnbytecount} : {memmap_mux[8:0], cache_hi_addr[9:0], 4'b0000};
// else if(s_prog_empty) cntrl0_user_command_register <= 2'b10; // read 32 bytes VGA else if(s_prog_empty && fifo_fill) cntrl0_user_command_register <= 2'b10; // read 32 bytes VGA
// else if(~s_prog_full) cntrl0_user_command_register <= 2'b10; // read 32 bytes VGA else if(~s_prog_full && fifo_fill) cntrl0_user_command_register <= 2'b10; // read 32 bytes VGA
// if(s_vga_endscanline) if(vcount==443) fifo_fill <= 1; if(s_vga_endscanline)
// vga_ddr_row_count <= 0; vga_ddr_row_count <= 0; fifo_fill <= 0;
I also changed the CPU/SDRAM clocks to 70/140 MHz, this fixed the minor palette animation slowdown in some cases that was present with 50/100 MHz when utilizing 01-autoflush.
I also noticed that in some games which do double buffering, it is possible to use vblnk only auto_flush (with 3'b110) without any ill effect and have major speed improvement. Along with the increased CPU/SDRAM frequency, Wolf3D get's about double the frame rate with 70/140 MHz vsync compared to 50/100 MHz hsync (of course many other games look bad then so depends on the case if the game works without artifacts). Landmark shows video speed actually being more than double.
The rbfs included here have pc speaker disabled, because pc speaker doesn't work well for me in skyroads, but otherwise these should be fine. These have the CPU clocked at 70 MHz and SDRAM at 140 MHz for anyone wishing to test.
Maybe some kind of intelligent flushing would be possible, e.g. if the cache controller sees a VGA RAM write, then it would enable flushing on hblank. After the flush, it should reset this flag. More complicated, but better solution would be to serve the VGA FIFO directly from the cache, if a cache hit occurs on the currently scanned VGA address.
I don't know if that's relevant to the PC speaker issue, but it seems that Adlib is not recognized at certain CPU speeds. This core is amazing because handles correctly many speed-sensitive games (I have several retropc and none can handle Stellar 7 intro properly, for example) so IMHO we should maintain and value the possibility of slowing the CPU for better compatibility. When I've seen the perfect scrolling of Crystal Caves and Commander Keen on this core I was speechless (and yes, I have an S3 Trio on my retro-rig).
BUG2 is fixed now (using the expected bit to switch normal/half pixel clock).
Thanks for an amazing core. I've tested quite a few games, including wolf3d and they all seem to work surprisingly well. This core also works better than ao486 at least in one respect, and that is that it doesn't have the missing last few pixels bug in per pixel scrolling games such as keen4. However, I have discovered a few other bugs listed below.
BUG1) Fast moving things on the screen are not always drawn in consistent order making animation look "noisy". For example running in prince of persia exhibits this behavior every bunch of frames. Similar issues are present in other games as well, but they are not always particularly visible. Looks like some vertical sync/vmem cache coherence issue to me, but don't know exactly. It is definitely there with Next186 and not with any real machines.
I've attached two photos which may help to understand it, a piece of code and how it looks when run. This code behaves very differently with next186 and a real machine. On a real machine there is always a moving vertical white line whereas with next186 the line has significant random discontinuities. This is rather strange for deterministic code and deterministic clocks. It is independent of cpu speed so probably a fundamental bug in verilog rather than any timing issue.
This bug is somewhat annoying to me, the other two I don't care too much about.
BUG2) The graphics mode/vga registers used by supaplex (dos game) in gameplay result in half-width pixels on next186. This occurs when writing outp(0x3c0, 0x30); outp(0x3c0, 0x21); The fix is to write instead outp(0x3c0, 0x30); outp(0x3c0, 0x31); However, this is nonstandard for vga. As far as I understand, the nonstandard fix writes the variable "half" in vga.v.
BUG3) PC-speaker screws with adlib sound causing noise and crackling, sometimes permanently. This is evident for example in skyroads (dos game). Eliminating pc-speaker gets rid of the issue.