phoboz / fpgagen

SEGA Genesis/Megadrive FPGA core
13 stars 4 forks source link

Gated clock violation (C101) #14

Open jotego opened 7 years ago

jotego commented 7 years ago

Hi Alastair,

The following gate clock definitions are causing problems in synthesis. This is marked as critical by the Design Assistant task.

Could you fix them, please?

Thank you, Pepe

Rule A101: Design should not contain combinational loops - Combinational loop 1     1
Combinational loop 1    user_io:user_io_d|spi_sck~0 1
Rule C101: Gated clock should be implemented according to the Altera standard scheme    user_io:user_io_d|spi_sck~0 2
Gated clock destination node(s) list    user_io:user_io_d|but_sw[1] 1
Gated clock destination node(s) list    user_io:user_io_d|cmd[0]    2
Gated clock destination node(s) list    user_io:user_io_d|byte_cnt[1]   3
Gated clock destination node(s) list    user_io:user_io_d|byte_cnt[4]   4
Gated clock destination node(s) list    user_io:user_io_d|byte_cnt[0]   5
Gated clock destination node(s) list    user_io:user_io_d|byte_cnt[2]   6
Gated clock destination node(s) list    user_io:user_io_d|byte_cnt[3]   7
Gated clock destination node(s) list    user_io:user_io_d|bit_cnt[1]    8
Gated clock destination node(s) list    user_io:user_io_d|bit_cnt[0]    9
Gated clock destination node(s) list    user_io:user_io_d|bit_cnt[2]    10
Rule C101: Gated clock should be implemented according to the Altera standard scheme    sd_card:sd_card_d|buffer_read_strobe    3
Gated clock destination node(s) list    sd_card:sd_card_d|buffer_rptr[3]    1
Gated clock destination node(s) list    sd_card:sd_card_d|buffer_rptr[1]    2
Gated clock destination node(s) list    sd_card:sd_card_d|buffer_rptr[0]    3
Gated clock destination node(s) list    sd_card:sd_card_d|always10~7_OTERM275   4
Gated clock destination node(s) list    sd_card:sd_card_d|buffer_rptr[4]    5
Gated clock destination node(s) list    sd_card:sd_card_d|buffer_rptr[5]    6
Gated clock destination node(s) list    sd_card:sd_card_d|buffer_rptr[6]    7
Gated clock destination node(s) list    sd_card:sd_card_d|buffer_rptr[7]    8
Gated clock destination node(s) list    sd_card:sd_card_d|buffer_rptr[8]    9
Gated clock destination node(s) list    sd_card:sd_card_d|buffer_rptr[2]    10
Rule C101: Gated clock should be implemented according to the Altera standard scheme    sd_card:sd_card_d|buffer_din_strobe 4
Gated clock destination node(s) list    sd_card:sd_card_d|buffer_wptr[5]    1
Gated clock destination node(s) list    sd_card:sd_card_d|buffer_wptr[0]    2
Gated clock destination node(s) list    sd_card:sd_card_d|buffer_wptr[1]    3
Gated clock destination node(s) list    sd_card:sd_card_d|buffer_wptr[2]    4
Gated clock destination node(s) list    sd_card:sd_card_d|buffer_wptr[3]    5
Gated clock destination node(s) list    sd_card:sd_card_d|buffer_wptr[4]    6
Gated clock destination node(s) list    sd_card:sd_card_d|buffer_wptr[7]    7
Gated clock destination node(s) list    sd_card:sd_card_d|buffer_wptr[6]    8
Gated clock destination node(s) list    sd_card:sd_card_d|buffer_wptr[8]    9
Gated clock destination node(s) list    sd_card:sd_card_d|altsyncram:buffer_rtl_0|altsyncram_vuc1:auto_generated|ram_block1a0~porta_we_reg  10
Rule C101: Gated clock should be implemented according to the Altera standard scheme    sd_card:sd_card_d|buffer_read_latch 5
Gated clock destination node(s) list    sd_card:sd_card_d|altsyncram:cid_rtl_0|altsyncram_jrc1:auto_generated|ram_block1a0~portb_address_reg0   1
Gated clock destination node(s) list    sd_card:sd_card_d|altsyncram:csd_rtl_0|altsyncram_jrc1:auto_generated|ram_block1a0~portb_address_reg0   2
Gated clock destination node(s) list    sd_card:sd_card_d|altsyncram:buffer_rtl_0|altsyncram_vuc1:auto_generated|ram_block1a0~portb_address_reg0    3
robinsonb5 commented 7 years ago

The first one refers to a somewhat hacky filter which is applied to the external SPI clock generated by the microcontroller. The correct method of conditioning this clock would be to feed it through a PLL, but we can't do that because the clock's speed isn't fixed. It's the same clock as is fed to SD cards, so it has to run at a low speed while the SD card is initialised, for example. I will see if I can find a better method to filter this signal.

I will see if I can improve the situation with the other signals. The problems are being reported in the fake SD card module, but probably originate from the control module's SPI access, which generates a (relatively low frequency) clock which would normally go directly off-chip to an external SD card.

robinsonb5 commented 7 years ago

I've just pushed a change which replaces the combinational loop clock filtering with a very shallow synchronous filter running at a 432MHz. This solves the combinational loop / delay chain critical issue in the design assistant.

Please test this and make sure it's at least as reliable as previous versions. (There's a hold violation on the new filter logic which I will need to solve, but I'd like it tested as is.)

(I've also pushed a change to the reset logic which syncronises signals coming out user_io to MCLK, which solves a good number of timing violations.)

robinsonb5 commented 7 years ago

Fixing the other two Design Assistant complaints is going to be complicated - it's going to need major changes to how the SD card module works, and since it wasn't my work in the first place it's going to take me a while to figure out.

As far as I can make out, the module currently uses a double-port RAM, one read port and one write port. Each port is currently driven by a multiplexed clock which is driven either by strobe signals generated in the domain of the external SPI clock, or by the Control Module's own SPI clock - switched according to whether a read or a write operation is in progress.

In order to win the Design Assistant's approval, this will need to be changed so that:

It will be a few days before I can put any time into this.

jotego commented 7 years ago

Thank you very much for looking into this deep changes. I have been fighting against another implementation issue since last Sunday and I have not been able to move forward yet. I still have not got another clean build... I am working on the Next branch. Hopefully I'll sort it tomorrow. Then I'll look at your changes and try to merge the two branches.

jotego commented 7 years ago

I have tried the new version. I did not get any sound out of it. When looked at timing errors I found many related to the interface between JT12 and the rest of the system. Like the FM_SEL signal. Of course, if that signal is not captured correctly then the whole interface gets broken...

The list is so long that overwhelms me. I have commented out all the sound system and tried synthesis again. The list of timing errors is still large. And it includes some possibly critical nets such as those related to OPCODE decoding in the 68k. I have started a new branch timing_closure without the sound system. I think that needs to get clean before trying to complicate things more. Adding more blocks to a system with timing problems is just going to make things worse for the FPGA synthesizer.

In my Next branch I experimented creating a separated clock domain for the JT12 FM synthesis module using the PLL and with a very clean separation between clock domains inside JT12. Using the PLL has the advantage of giving one of the global clock trees inside the FPGA to the clock. In the JT12 project there is a folder called quartus where I have tried synthesizing just JT12 on its own, including digital filters and a second-order sigma delta. It synthesizes very quickly (less than 2 minutes including fitting) and without errors. Then when I go to FPGAgen the whole thing gets complicated. Synthesis is very long, I get tons of errors and ultimately, sound does not get produced on the actual device or it sounds weird.

At some stage we had a clean implementation (or at least it looked like one). But for some reason the synthesis tool is not producing clean results anymore. I am a bit on despair now. I think I am going to take some time away from FPGAgen and try to build the core of a different system in order to learn and understand what is failing here.

As for JT12 there are really only two things missing:

I want to eventually fix those but that is not what is holding FPGAgen back now. Although the timing issues of FPGAgen may be behind the Thunder Force IV bug as the Flag A/B implementation actually seems fine in JT12...

In summary, we need to have the timing_closure branch clean before adding the sound system. And I need to take a break from FPGAgen in order to learn the skills I am missing to help here.

robinsonb5 commented 7 years ago

I've experimented with using a separate PLL-generated clock to replace VCLK, and it solves pretty much all the timing violations except the ones related to the TG68 being clocked too fast. However, I haven't got the phase right, because while the PSG works fine, I get random sounds out of the FM part. (Note, because of the way the TG68's designed, it's almost certainly OK despite these errors - it's just that putting multicycles in the constraints file for every path through the TG68 isn't practical.)

I will continue to try and solve the timing issues and Design Assistant violations, and report any successes.

jotego commented 7 years ago

I am glad to hear you tried the PLL clock. I think the solution must be along those lines: PLL generated clocks and use of synchronizers between blocks.

PLL clocks are guaranteed to run through global clock trees, which give meet timing constraints much more easily than signal routing. However you should not trust that two clocks out of the PLL have the right phase. You just have to use synchronizers.

By the way, that recommendation of using the same clock plus a clock enable signal for lower frequency clocks has some traps:

  1. You have to generate the clock enable signal so it toggles on the unused clock edge
  2. The clock enable signal has the same frequency as the original clock, so it needs to run on a dedicated clock tree too
  3. Clock gating is a technique to reduce dynamic power, not to reduce clock frequency. When it is used at high speed it looks like having two clocks.
  4. All the logic running on a clock-enabled version of the main clock still has to meet the original clock constraints, so it loses the advantage of lower frequency and makes it more difficult for the fitter.

I think running each block independently at its own real speed and using the clock tree is the way to go. There are 20 global clock trees. That is plenty as we only need an independent clock for large blocks (I think it is ok to use the clock gating technique for a few flops). At least we need clock domains for Z80, M68k, YM2612 and VDP. With synchronizers between them if needed.

Now, the difficulty here is that the glue logic to do this is completely different to what is there now. All blocks become mutually asynchronous, although each one must be synchronous on its own. Particularly I am not sure how that affects the RAM controller.

As for the RAM and sprite flickering... I know you talked about a sprite cache. But shouldn't it rather be a CPU cache? I mean, the highest priority should go to video hardware because that is the one that need to produce a result in real time. If you make a sprite cache but the cache gets flushed in the middle of an image update, then it will cause a visual void in the image. However, if the CPU is running off the cache and gets a flush event it will probably not be noticeable.

Do not worry about random sounds. Just comment out all the sound system. If the M68k/Z80/VDP get the timing right, then the sound system will work out of the box. Random sounds come from timing problems in the FM_SEL signal, the FM data lines and the FM busy signal. All these prevent clear communication of commands to and from JT12.

2017-03-19 20:28 GMT+01:00 robinsonb5 notifications@github.com:

I've experimented with using a separate PLL-generated clock to replace VCLK, and it solves pretty much all the timing violations expect the ones related to the TG68 being clocked too fast. However, I haven't got the phase right, because while the PSG works fine, I get random sounds out of the FM part. (Note, because of the way the TG68's designed, it's almost certainly OK despite these errors - it's just that putting multicycles in the constraints file for every path through the TG68 isn't practical.)

I will continue to try and solve the timing issues and Design Assistant violations, and report any successes.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/phoboz/fpgagen/issues/14#issuecomment-287640580, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxtfI1NTezvsJxFjVU7MzO2cN7fpllxks5rnYHkgaJpZM4Mae9H .

jotego commented 7 years ago

By the way, this is an useful document about clocks in our device.

robinsonb5 commented 7 years ago

As you say, PLL clocks are guaranteed to go through the global clock trees, so much better than ripple clocks. I would disagree about the phases and need for synchronizers, though. As long as multiple clocks coming out of the PLL are simple multiples of each other (such as 108MHz, /2 = 54MHz, /14 = 7.714Mhz) then they will be edge aligned and synchronizers aren't necessary. You have quite fine-grained control of the phase on each output, too - we use that, for instance, to phase shift the external clock that reaches the SDRAM chip.

I think I've confused things by calling the enable signal a "clkena" signal - the scheme I was suggesting looks like this: always @(posedge clk) begin if ena=1'b1 begin ... end end

That's not a gated clock, so the considerations you listed don't apply. A gated clock looks like this: wire gated_clk; assign gated_clk = clk & ena; always @(posedge gated_clk) begin ... end

The latter passes the clock itself through some logic, whereas the former adds the enable signal as an extra term to the logic, but the clock itself isn't affected. My instinct would be to generate VCLK using the PLL, but to use the first of these two schemes instead of the lower-frequency ripple clock you're currently using.

The reason I talked about sprite caches is simply that apparently, the actual Megadrive chipset has them!

jotego commented 7 years ago

Yes, I think you are right about the phase. If the PLL produce well aligned phases and all FF work on the same edge then you can skip synchronizers and just use multi-cycle and false path constraints.

I see the difference between the two clock enable schemes. The one you propose is called synchronous clock gating according to a book I bought last week (well, I actually got my company to pay for it. My supervisor was surprised about my asking for a digital book as I am an analog designer). Nonetheless, it looks like the tools can actually take the freedom to put that signal through the clock enable input of the FPGA FF. Check out the example here http://quartushelp.altera.com/15.0/mergedProjects/hdl/vlog/vlog_file_dir_direct_enable.htm. I guess if you have the ena signal after the reset part of the FF it will not take it through the clock enable input.

All this is very interesting...

2017-03-20 16:29 GMT+01:00 robinsonb5 notifications@github.com:

As you say, PLL clocks are guaranteed to go through the global clock trees, so much better than ripple clocks. I would disagree about the phases and need for synchronizers, though. As long as multiple clocks coming out of the PLL are simple multiples of each other (such as 108MHz, /2 = 54MHz, /14 = 7.714Mhz) then they will be edge aligned and synchronizers aren't necessary. You have quite fine-grained control of the phase on each output, too - we use that, for instance, to phase shift the external clock that reaches the SDRAM chip.

I think I've confused things by calling the enable signal a "clkena" signal - the scheme I was suggesting looks like this: always @(posedge clk) begin if ena=1'b1 begin ... end end

That's not a gated clock, so the considerations you listed don't apply. A gated clock looks like this: wire gated_clk; assign gated_clk = clk & ena; always @(posedge gated_clk) begin ... end

The latter passes the clock itself through some logic, whereas the former adds the enable signal as an extra term to the logic, but the clock itself isn't affected. My instinct would be to generate VCLK using the PLL, but to use the first of these two schemes instead of the lower-frequency ripple clock you're currently using.

The reason I talked about sprite caches is simply that apparently, the actual Megadrive chipset has them!

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/phoboz/fpgagen/issues/14#issuecomment-287795450, or mute the thread https://github.com/notifications/unsubscribe-auth/ABxtfJKJeQAm-iHB9GPWquh-zG4-gY5lks5rnps8gaJpZM4Mae9H .

robinsonb5 commented 7 years ago

I've just pushed an experimental branch, NewSD, which contains a modified version of the fake SD Card module. This version uses real PLL-generated clocks for both sides of the fake SD card interface, and treats both the incoming SPI clock as data. This removes all the critical violations in the design assistant, and I'm hopeful that it will solve the hang-during-ROM-loading problem, which I think was due to metastability causing a state machine within the fake SD card module to get stuck..

I hope it will help, but the FPGA is getting quite full now, and the fuller it gets the worse any build-to-build stability issues will get.

jotego commented 7 years ago

Thanks for the update. I have checkout the branch and synthesized. Indeed there are no critical violations pointed out by the design assistant. That is great!

I have seen that there are still high violations related to the SD, particularly synchronizing of signals between clock domains and synchronizing the reset (you have the reset as FIXME in the code). These could also be part of the hang-up problem.

Still, loads of timing violations :-( but at least this is moving forward.

phoboz commented 7 years ago

Hello, Yes the game loading is much more stable now. I have tried it as well as some other people on the forum ` Re: Genesis / Megadrive core ported to MiST Unread postby breiztiger » Thu May 18, 2017 6:56 pm hi

yes loading game is much strong

thanks for your work

Unread postby DanyPPC » Thu May 18, 2017 8:41 pm Good Job ! You are on the right way ;) `