jotego / jtcores

FPGA cores compatible with multiple arcade game machines and KiCAD schematics of arcade games. Working on MiSTer FPGA/Analogue Pocket
https://patreon.com/jotego
GNU General Public License v3.0
241 stars 41 forks source link

s18: VDP connection #655

Closed jotego closed 6 months ago

jotego commented 6 months ago

The VDP connection does not work. astorm's test menu can run a VDP memory test and it always fails. This can be reproduced in simulation:

> swcore s18 ver/astormj
> jtframe mra s18
> sim.sh

The simulation is quite long to run and the output comes a few frames before frame 800.

frame_00778

MegaDrive's VDP connection can be seen in the schematics. The VDP is only connected to the M68000 and there is no external chip select signal. The VDP decodes the full address bus.

Using NukeYKT's netlist for the VDP (ym7101) seems to require a 2x clock in one input and the expected clock in another. The memory is connected as seen in Nuke's MegaDrive netlist.

Still, the memory test does not pass.

Setting JTFRAME_CLK96 -in order to get the 2x clock- seems to break the video output, at least in sidi128. It also creates a lot of timing errors. Whether the video is broken because of timing errors of because of a JTFRAME bug that happens when JTFRAME_CLK96 is set is unknown.

Some games (shdancer) seem to run fine without the VDP, at least as far as tested. The games that do not boot may be failing because of the VDP.

Another failure mode of the VDP can be seen around frame 3 of astorm, when the CPU reads from the VDP but there is no DTACK signal produced, so the CPU will hang up. In order to prevent the hangup, the VDP dtack signal does not halt the bus at the moment.

The VDP dtack signal goes into the mapper. The mapper is not likely to add extra cycles to it.

jotego commented 6 months ago

The VDP is read in Shadow Dancer when accessing the IO controller:

image

This seems to be the reason why the tilemap banks are wrong for the S16B scroll chip.

jotego commented 6 months ago

Check out the s18_vdp96mhz branch for the VDP running at the right clock.

jotego commented 6 months ago

Apparently the VDP netlist was originally running at x1 clock using adjusted frequency scalers inside. But, some games had problems with timing. The author then moved to x2 clock, original frequency scalers and a second x1 clock input.

For x1 clock input, we can use clk48 or a register. It does not seem to make a difference. The x1 is only used in the scalers if I remember correctly.

gyurco commented 6 months ago

What's the latest code which I should use as a base? As I see, in the latest commits, the Nuked VDP was removed.

jotego commented 6 months ago

The master one has the VDP removed and it is synthesized at 48MHz.Use the s18_vdp96mhz branch instead.

jotego commented 6 months ago

I just pushed some local changes to that branch. I leave it for you from here.

gyurco commented 6 months ago

There are some progress, however I see the current VRAM module is very inefficient: it using a full M9K for a 256 bytes block. Is the original with byte enables are not allowed? (I know the ALTSYNCRAM IP must be used, since inference doesn't work for byte enables). Or other approach can be to try to implement the serial port as a second port using dual-port RAM.

gyurco commented 6 months ago

More observations: the s16 video and the VDP video are out of sync. VDP has a register, which turns the vsync output into a pixel clock output. Alien Storm sets it, so maybe if enabled, the VDP generates the pixel clock. No idea what syncs the frames (as no valid vsync from VDP in this case).

gyurco commented 6 months ago

Actually VDP and S16 pixel clock are identical: VDP has EDCLK/2, where EDCLK is 12 MHz. Then VDP can be hacked to output the real VSYNC, even if the test bit is set (not sure how the original hardware does it, maybe recover vsync from csync? - as the later is not affected by the test bit). Bigger challenge is to sync the S16 video to the VDP video.

jotego commented 6 months ago

I see the current VRAM module is very inefficient: it using a full M9K for a 256 bytes block. Is the original with byte enables are not allowed? (I know the ALTSYNCRAM IP must be used, since inference doesn't work for byte enables). Or other approach can be to try to implement the serial port as a second port using dual-port RAM.

Use an ifdef SIMULATION precompiler statement to instantiate the IP during synthesis. The problem with the Altera IP is the simulation model. I think those models are not verilator friendly. We can have one for simulation and one for synthesis.

astorm has a convenient memory test screen when in service mode. You can pass the page until you get to the VDP RAM test. That test passing is a first milestone.

Video Synchronization

The schematics we made for this in s18/sch are easy to navigate if you open them with KiCAD rather than looking at the PDF available in JTBIN, which is only updated weekly at most. You will see that it is not very clear what the direction of the sync and some clock signals is on page 26. There is a driver missing for the PRE_SYNC signal too. We should also extract the PAL equations and place them on the schematic sheet to make things more clear.

EDCK is floating in the Moonwalker PCB. I should measure it on Alien Storm.

Video Mixing

Schematics page 15 shows how the video is merged. It is done in the analogue domain. There are analogue switches and either the VDP drives the output or the color encoder (i.e. the arcade video) does it. Shadow Dancer and Bloxeed do not seem to use the VDP to draw graphics at all and just execute dummy writes and reads on it. Alien Storm does use the VDP to draw many things.

Even if the VDP is a netlist, outputting VSYNC and HSYNC should not be hard. The JTS16 video module can be modified to take those inputs in a special operation mode. Indeed, that kind of difference is what must distinct the chips used in S18 from those used in S16B.

gyurco commented 6 months ago

Now RAM test passes, and VDP video looks good. Probably CRAM dots should be disabled. F9 can be used to switch between the two video outputs, until the merging is done.

gyurco commented 6 months ago

About CRAM dots: probably they're visible currently, because vint still comes from the s16 video, and CRAM writes in the VINT handler fall into the active area.

jotego commented 6 months ago

Congratulations on getting this far so quickly. I can see the pink enemy in Alien Storm now when I select the VDP output.

What are these CRAM dots? Are you referring to visual artifacts due to the color RAM? The VDP output should be sampled at pxl2_cen rate at a convenient place. Part of this could also be the timing violations. There are over 1000 violations over 5ns inside the VDP. Although they seem to share a handful of source/target nets, so it may be simpler than it looks.

Currently the 96MHz branch produces blurry video for the S16B video subsystem too. The reason is that the target subsystem is operating at 96MHz but the video works at 48MHz. We need appropriate synchronizers there. I am not too worried about the S16B part.

I was expecting more games booting up after fixing the VDP memory access but the situation seems to be the same. So the culprit for those games must be elsewhere.

Awesome progress, thank you!

gyurco commented 6 months ago

The VDP has a quirk if you write to CRAM, then it's visible on the screen, so games usually writes to CRAM in blanks. https://segaretro.org/Sega_Mega_Drive/Palettes_and_CRAM#CRAM_dots As VINT is currently not from the VDP, the vertical interrupt handler runs at random places (the running pixels - they're appear exactly at CPU read/write access slots).

As most part of the VDP works at 48 MHz, probably it's enough to define a multicycle path in it. The 96 MHz is only needed, because it samples MCLK_e (48 MHz) at the 96 MHz clock domain (instead of rewriting the whole thing to a clock-enable design - would be a huge job).

jotego commented 6 months ago

Thank you for the CRAM explanation. That makes sense. Indeed, we need different video counters for S16B video in this context so they are synchronized.

It might be easier to adjust the scalers in the VDP to operate from 48MHz instead of 96MHz, so we do not need to care about 96MHz problems. I do not think we will run into the odd problems the author saw in some MegaDrive games. I wonder if the original repository still holds a version configured for x1 clock.

The multicycle path may be easy to set. But then you're left to fix the 48MHz to 96MHz conversion for the S16B video.

gyurco commented 6 months ago

What about using 48MHz clock everywhere, but the VPD? It's isolated, doesn't use the SDRAM, it should work.

jotego commented 6 months ago

That's good too. The framework currently does not support that use case as clk96 always plays a role either in the SDRAM or as clk_sys. That made sense 4 or 5 years ago, but as the framework matured and ports became handled automatically, we can do it in a different way.

I have started to untangle that. I have pushed a commit, 03c319ec, that works with the clk96 used only in the VDP. The framework conversion is not finished yet and simulations will not work on clk96 cores. I need to check that it didn't break other 96MHz cores either.

Maybe if you add the multicycle path to the internal VDP nets after this, it will pass timing correctly.

I will need to make one or two more commits to the branch later to finish the framework update.

jotego commented 6 months ago

I have merged the s18_vdp96mhz into master because the changes in the framework were too large to keep separated, there were edits for the s18 core in both branches and the s18_vdp96mhz had got to a significant milestone. @gyurco, please continue off the master branch.

gyurco commented 6 months ago

Do you have any video in master? It's only black for me, both S16 and VDP in any game.

jotego commented 6 months ago

Somehow a line got committed with a blank in a terrible place. I just fixed it.

gyurco commented 6 months ago

These rules are for the VDP. Is it enough to add this to hdl/jts18.sdc? Or how to apply this to all targets?

set_multicycle_path -from {jts18_game_sdram:u_game|jts18_game:u_game|jts18_video:u_video|jts18_vdp:u_vdp|ym7101:u_vdp|*} -setup 2
set_multicycle_path -from {jts18_game_sdram:u_game|jts18_game:u_game|jts18_video:u_video|jts18_vdp:u_vdp|ym7101:u_vdp|*} -hold 1

set_multicycle_path -from {jts18_game_sdram:u_game|jts18_game:u_game|jts18_video:u_video|jts18_vdp:u_vdp|clk2} -setup 2
set_multicycle_path -from {jts18_game_sdram:u_game|jts18_game:u_game|jts18_video:u_video|jts18_vdp:u_vdp|clk2} -hold 1

set_multicycle_path -from {jts18_game_sdram:u_game|jts18_game:u_game|jts18_video:u_video|jts18_vdp:u_vdp|rst_n} -setup 2
set_multicycle_path -from {jts18_game_sdram:u_game|jts18_game:u_game|jts18_video:u_video|jts18_vdp:u_vdp|rst_n} -hold 1

set_multicycle_path -from {jts18_game_sdram:u_game|jts18_game:u_game|jts18_video:u_video|jts18_vdp:u_vdp|edclk_l} -setup 2
set_multicycle_path -from {jts18_game_sdram:u_game|jts18_game:u_game|jts18_video:u_video|jts18_vdp:u_vdp|edclk_l} -hold 1
jotego commented 6 months ago

Almost. Check out s16b for an example. It basically consists of:

I used timing.sdc for s16b, which is the only core doing this. Let's keep the same syn/timing.sdc name for now.

The syn folder may be a location for other constraints in the future, like placement or partitions. For now, I think the SDC is the only use case.

jotego commented 6 months ago

Thank you for the timing.sdc. I added one more exception to it.

It looks like the VDP is producing 14.28kHz HS period, rather than 15kHz. 14.28kHz is what you would expect from the VDP when operated at 48MHz rather than 53MHz (MegaDrive). But the PCB produces 15kHz. I am looking at ways it could synchronize to something else. Reversing the EDCLK polarity (ie, undoing your "reverse"), letting CSYNC from the S16B chips merge in, inputting 10 MHz through CLK1... nothing seems to work. Let me know if you have any ideas, please.

jotego commented 6 months ago

The hardware seems to some kind of EDCK throttling. It switches it between 12 and 16MHz. The logic is in the buffers sheet of the schematics. I think the VS pulse gets synchronized this way too.

Anyway, the original goal of this issue is achieved. @jtmiki this one is completed.

s18 EDCK

gyurco commented 6 months ago

Probably it speeds up edclk during blanks? As I see, the PAL is reverse engineered, it would be interesting to run it in a simulator. I wonder which VSYNC is I3? E.g in Alien Storm, VSYNC is actually transformed to pixel clock output in the VDP.

jotego commented 6 months ago

It is speeding it up during horizontal blanking. The signals are actually quite poorly formed despite they are using 74F chips as much as they can. It still seems enough for the VDP to detect it, which is the same as saying the VDP has poor noise rejection.

I can see 8MHz at the output of VDP's VSYNC pin, so it is outputting the pixel clock indeed.

I do not know how the actual VSYNC pulse is generated or even who controls it. The equation for the relevant PAL pin is sadly not produced by the jedutil tool. It looks like both chips can pull the node down.

The image in the core right now is stable. The VDP part is not centered, but it does not drift away over time.

All the VDP related hardware on this board is so messy that it makes little sense other than to put off bootleggers. Indeed, the Alien Storm bootlegs do not seem to produce the VDP graphics at all and they modified the program code to skip VDP-drawn enemies.

HB controls EDCK EDCK transition

gyurco commented 6 months ago

It's even interesting how that HB signal is generated. T10 counts frames (VS input for clock), then the input to IC72 is VS, the frame count and sprpal8, which results in HB? Probably sprpal8 what's really used?

jotego commented 6 months ago

Indeed. That HB signal is named like that for analogy, but it is not the actual H blank. It must be very close though. I have renamed sprpal8 to HS, again, it is not the actual one (it is too short) but it serves that function. In Moonwalk, that VS signal is indeed moving at the VS rate, but in Alien Storm it looks like a sine wave at 150Hz. This VDP has been plugged there without mercy.

gyurco commented 6 months ago

BTW I don't know if averaging EDCLK will work. The pixel clock is directly derived from EDCLK, so if VDP and S16 don't match, there will be a skew.

gyurco commented 6 months ago

That's why they use 12 MHz in the active area, and speed it up in the blanks (where the skew doesn't matter).

jotego commented 6 months ago

BTW I don't know if averaging EDCLK will work. The pixel clock is directly derived from EDCLK, so if VDP and S16 don't match, there will be a skew.

If the averaged pixel clock did not have jitter, it could work. But generating it using a fractional divider makes the period irregular and it can be noticed in the image (not much, though). I am going to try their scheme next.

gyurco commented 6 months ago

I mean jitter to the S16 pixel clock, if they don't match, then related pixels will shift. Or I don't understand the mixing.

jotego commented 6 months ago

I mean jitter to the S16 pixel clock, if they don't match, then related pixels will shift. Or I don't understand the mixing.

Yes, the right way to do this is to match the clocks. That's what they did with the 12MHz input during the active time.

I have it working in 4359942a. The VDP can synchronize to external signals, as long as you provided them. The HSYNC pin connection is not identified currently in the schematics, but it must exist because omitting it prevents the synchronization from happening (at least in the YM7101 netlist we have).

So the VDP takes the HS and CSYNC pulses from the counter/PAL output. I am currently using the S16B signals as they are, so the image appears a bit shifted to the right and bottom. But, it appears consistently at the same location.

That YM7101 netlist has proven to be convenient after all. I guess normal MegaDrive VDP implementations do not care to implement external sync pulses.

image

gyurco commented 6 months ago

Interesting if they're used as inputs. Probably that's why the red enemy blob sprites are not visible, as the line is cut during the sprite rendering slots?

jotego commented 6 months ago

Interesting if they're used as inputs. Probably that's why the red enemy blob sprites are not visible, as the line is cut during the sprite rendering slots?

That's actually because the priority is done a crude way right now. If you manage to cast your shadow on the blob sprite you can see it.