MIST core: Timing constraints not met

jotego commented 7 years ago

The MIST core has timing issues and does not get synthesized correctly. Some implementations do seem to work nicely but others fail from the very beginning.

Constraints and operating conditions need some redifinition and the design probably needs some refinement too in order to avoid these timing problems.

TimeQuest Timing Analyzer Summary

Type : Slow 1200mV 85C Model Setup 'clock|altpll_component|auto_generated|pll1|clk[1]' Slack : -9.438 TNS : -1947.729

Type : Slow 1200mV 85C Model Setup 'clock|altpll_component|auto_generated|pll1|clk[0]' Slack : -5.792 TNS : -380.770

Type : Slow 1200mV 85C Model Setup 'sdclk_pin' Slack : -4.515 TNS : -70.551

Type : Slow 1200mV 85C Model Hold 'clock|altpll_component|auto_generated|pll1|clk[1]' Slack : -3.855 TNS : -246.855

Type : Slow 1200mV 85C Model Hold 'sdclk_pin' Slack : -1.071 TNS : -1.071

Type : Slow 1200mV 85C Model Hold 'clock|altpll_component|auto_generated|pll1|clk[0]' Slack : 0.454 TNS : 0.000

Type : Slow 1200mV 85C Model Recovery 'clock|altpll_component|auto_generated|pll1|clk[1]' Slack : 11.782 TNS : 0.000

Type : Slow 1200mV 85C Model Removal 'clock|altpll_component|auto_generated|pll1|clk[1]' Slack : 18.677 TNS : 0.000

Type : Slow 1200mV 85C Model Minimum Pulse Width 'clock|altpll_component|auto_generated|pll1|clk[2]' Slack : -3.952 TNS : -3.952

Type : Slow 1200mV 85C Model Minimum Pulse Width 'sdclk_pin' Slack : -3.952 TNS : -3.952

Type : Slow 1200mV 85C Model Minimum Pulse Width 'clock|altpll_component|auto_generated|pll1|clk[0]' Slack : 3.498 TNS : 0.000

Type : Slow 1200mV 85C Model Minimum Pulse Width 'clock|altpll_component|auto_generated|pll1|clk[1]' Slack : 15.084 TNS : 0.000

Type : Slow 1200mV 85C Model Minimum Pulse Width 'clk_27' Slack : 18.384 TNS : 0.000

Type : Slow 1200mV 0C Model Setup 'clock|altpll_component|auto_generated|pll1|clk[1]' Slack : -8.486 TNS : -1542.736

Type : Slow 1200mV 0C Model Setup 'clock|altpll_component|auto_generated|pll1|clk[0]' Slack : -4.993 TNS : -269.462

Type : Slow 1200mV 0C Model Setup 'sdclk_pin' Slack : -3.484 TNS : -54.185

Type : Slow 1200mV 0C Model Hold 'clock|altpll_component|auto_generated|pll1|clk[1]' Slack : -3.620 TNS : -234.595

Type : Slow 1200mV 0C Model Hold 'sdclk_pin' Slack : -1.053 TNS : -1.053

Type : Slow 1200mV 0C Model Hold 'clock|altpll_component|auto_generated|pll1|clk[0]' Slack : 0.402 TNS : 0.000

Type : Slow 1200mV 0C Model Recovery 'clock|altpll_component|auto_generated|pll1|clk[1]' Slack : 12.003 TNS : 0.000

Type : Slow 1200mV 0C Model Removal 'clock|altpll_component|auto_generated|pll1|clk[1]' Slack : 18.360 TNS : 0.000

Type : Slow 1200mV 0C Model Minimum Pulse Width 'clock|altpll_component|auto_generated|pll1|clk[2]' Slack : -3.952 TNS : -3.952

Type : Slow 1200mV 0C Model Minimum Pulse Width 'sdclk_pin' Slack : -3.952 TNS : -3.952

Type : Slow 1200mV 0C Model Minimum Pulse Width 'clock|altpll_component|auto_generated|pll1|clk[0]' Slack : 3.361 TNS : 0.000

Type : Slow 1200mV 0C Model Minimum Pulse Width 'clock|altpll_component|auto_generated|pll1|clk[1]' Slack : 14.984 TNS : 0.000

Type : Slow 1200mV 0C Model Minimum Pulse Width 'clk_27' Slack : 18.369 TNS : 0.000

Type : Fast 1200mV 0C Model Setup 'clock|altpll_component|auto_generated|pll1|clk[1]' Slack : -0.666 TNS : -1.178

Type : Fast 1200mV 0C Model Setup 'sdclk_pin' Slack : -0.618 TNS : -9.021

Type : Fast 1200mV 0C Model Setup 'clock|altpll_component|auto_generated|pll1|clk[0]' Slack : -0.491 TNS : -0.828

Type : Fast 1200mV 0C Model Hold 'clock|altpll_component|auto_generated|pll1|clk[1]' Slack : -1.697 TNS : -112.795

Type : Fast 1200mV 0C Model Hold 'sdclk_pin' Slack : -0.987 TNS : -0.987

Type : Fast 1200mV 0C Model Hold 'clock|altpll_component|auto_generated|pll1|clk[0]' Slack : 0.185 TNS : 0.000

Type : Fast 1200mV 0C Model Recovery 'clock|altpll_component|auto_generated|pll1|clk[1]' Slack : 13.793 TNS : 0.000

Type : Fast 1200mV 0C Model Removal 'clock|altpll_component|auto_generated|pll1|clk[1]' Slack : 16.989 TNS : 0.000

Type : Fast 1200mV 0C Model Minimum Pulse Width 'clock|altpll_component|auto_generated|pll1|clk[2]' Slack : -0.031 TNS : -0.031

Type : Fast 1200mV 0C Model Minimum Pulse Width 'sdclk_pin' Slack : -0.031 TNS : -0.031

Type : Fast 1200mV 0C Model Minimum Pulse Width 'clock|altpll_component|auto_generated|pll1|clk[0]' Slack : 3.554 TNS : 0.000

Type : Fast 1200mV 0C Model Minimum Pulse Width 'clock|altpll_component|auto_generated|pll1|clk[1]' Slack : 15.283 TNS : 0.000

Type : Fast 1200mV 0C Model Minimum Pulse Width 'clk_27' Slack : 17.946 TNS : 0.000

renaudhelias commented 7 years ago

Did you apply mist.sdc ? What about Time Constraint final generated clock table result ? it's a more significative report...

jotego commented 7 years ago

I do not find a nice way of pasting results here and I am not sure about which report you are referring to but I think this summarizes the situation:

As you see, the device is missing the target by quite a lot. I am using the exact setup available in github, which already applies the mist.sdc file. Sometimes the implementation seems to work on MIST, which makes me think that some of the data paths are actually false paths or multi-cycle paths and they are not relevant. But because the tool is trying to optimize false paths too, the true paths fall out of the spec's for some implementations. I do not know the MIST/Atari ST architecture well enough to start adding false/multicycle paths to the SDC file. The SDC file is now too simple. A more detailed one is needed.

I am trying to add a new device for the ST in the MIST core but synthesis fails to produce a working MIST too often. Trying to understand what happens, I found that the original MIST core has these timing issues.

renaudhelias commented 7 years ago

PPL is sometime hacked to reach low arcade clocks, by lying clock input in sdc file, making coolest time equation. But it is infrequent (and is about 27/2 or 27/4 not 27/1.342...). Normally 27MHz shall be 27MHz in Generated clocks table, or else mist.sdc is not taken correctly into account.

I don't remember exactly, I think you have to add mist.sdc by using right click on "TimeQuest task">configure (just between Synthesis task and Assemble task ?)

jotego commented 7 years ago

I have made some screen shots. The 27MHz clock frequency is observed in the clock table. The PLL is used to generate 32MHz and 128MHz and the RTL code is then using the 32MHz to generate 8MHz. There are many clocks unconstrained, though.

I am going to see if there are more complete mist.sdc files in prior versions. The current one is minimal. Time analysis shows issues in the video module and that is one of the problems I often found: corrupted video and MIST not powering up properly.

(Again, I am always refereing to the MIST core, i.e. the Atari ST core)

screenshot-1 screenshot-2 screenshot-3 screenshot-4

renaudhelias commented 7 years ago

In CoreAmstrad I use : --signal c0 : std_logic;--27MHz 20/135 =4MHz Z80/bootloader --signal c1 : std_logic;--27MHz 125/135=25MHz VGA --signal c2 : std_logic;--27MHz 572/135=114.4MHz SDRAM --signal c3 : std_logic;--27MHz 1/2250 =12kHz keyboard keyboard's12kHz is RTL made, but 4MHz is still pure PLL. Perhaps 40/135 is tolerated... Can I have the equations (multiplier/diviser) used here (in MIST core) ?

*my sdram is adapted for 114.4MHz just have to change normaly the RASCAS_DELAY and CAS_LATENCY parameters in sdram.v by ones in https://github.com/renaudhelias/CoreAmstrad/blob/master/BuildYourOwnZ80Computer/zsdram.v

Normaly you can reach all MHz frequency using PLL (by solving equation : a common diviser, a maximum of common multiplier small first numbers (20 = 5 * 2 * 2; 125 = 5 * 5 * 5; 572 = 2 * 2 * 11 * 13)) For kHz do use RTL... Adding a not(clock) does break the time constraints, the best is switching between rising_edge/falling_edge instead of adding not(clock). Personnaly I use a main "clock componant", wiring all "clock" and "not clock". It's easier to solve time constraint problems (not(not(clock)))

renaudhelias commented 7 years ago

Another equation set (last CoreAmstrad), with common multipliers this time c0 27MHz 89/600 = 4MHz (Z80) =>perhaps 89/300 is fine for our project c1 27MHz 89/48 = 50MHz (VGA) c2 27MHz 89/21 = 114,4MHz (SDRAM) c3 27MHz 89/150 = 16MHz (PWM)

You can make an Excel table with 27 at top left, and multiplier at top (1 2 3 4...), and diviser at left (1 2 3 4...), and put a =$A$1*B$1/$A2 at B2. If you find a same column or a same row with all your clock, you win.

600 = 5 * 5 * 3 * 2 * 2 * 2 48 = 3 * 2 * 2 * 2 * 2 21 = 3 * 7 150 = 5 * 5 * 3 * 2

jotego commented 7 years ago

(I didn't know you were the author of Amstrad core. Let me thank you for that contribution. I really enjoy it!)

These are the PLL settings in MIST:

c0 128MHz -> 27/27_128 c1 32MHz -> 27/27_32 c2 128MHz (used as SDRAM_CLK) --> 27/27*128 with -2500 phase shift

I am not sure what the phase shift is about. Then they have a lot of clock dividers using RTL in the logic, like this one:

//// 8MHz clock ////
reg [1:0] clk_cnt;

always @ (posedge clk_32, negedge pll_locked) begin
    if (!pll_locked) begin
        clk_cnt <= 2'd2;
    end else begin 
        clk_cnt <= clk_cnt + 2'd1;
    end
end

assign clk_8 = clk_cnt[1];

I think that the FPGA may be failing to recognize signals like clk_8 as clocks and definetely it is not deriving the right frequency constraint for them. If they are not recognized as clocks they are not routed as clocks either using the special clock tree routing inside the FPGA. Are you using any clock of this sort?

Your comment about negated clocks causing problems also worries me. But you say that using posedge and negedge in the always statement is fine, isn't it?

renaudhelias commented 7 years ago

Respecting Time Constraint is a good practice. For complex component is nicer to respect them (like processor clock)

phase_shift seems here just a hack to start sdram before core (a reset not implemented somewhere ?), normaly the reset signal is delayed by ARM. I don't use phase_shift in CoreAmstrad, I don't use pll_locked also (it's about stabilisation of PLL start), the "negedge pll_locked" seems used here as another hack to start processor after sdram...

So if you try adding a c3 at 8Mhz 27 / 27 * 8 you will perhaps have to remove the pll_locked and phase_shift hack.... or else plug pll_locked as processor reset (not as vhdl process stimulus (not as clock but as simple value))

In my version of sdram.v (zsdram.v) I added a synchronization algorithm (commented "some synchro by here")... perhaps it shall help to stabilize sdram in case of problems...

Your comment about negated clocks causing problems also worries me. But you say that using posedge and negedge in the always statement is fine, isn't it?

Yes it's fine, but a lot of big components comes from opencores, and should not change... (patch has to be commented)

robinsonb5 commented 7 years ago

The phase shift is on the external clock signal that goes to the SDRAM chip itself and it's there to help make sure the timing requirements of the chip are met, so that control and data signals are stable before the SDRAM sees the clock edge.

renaudhelias commented 7 years ago

It's the inverse, sdram controler has a boot slow standard full re-init/reset sequence, sending several init commands to sdram, and then became ready to receive data... after a certain time.

harbaum commented 7 years ago

I think you are confusing a few things.

First there's the hardware startup. When a fresh core has been loaded the FPGA and its PLLs need some time before the clocks are all stable at the right frequency. This is what the pll locked signal is about. A core should not do anything before the clocks are stable. Thus the CPU reset as well as e.g. the sdram controller usually waits for the locked signal to become true.

Then there's the sdram. Unlike dram the modern sdram needs to be initialized before it can be used. So sdram init happens before the cpu can be started but after the pll has locked. Then the cpu can finally be started by releasing its reset.

So the sequence should look like: Everything waits for the pll to lock. Then the sdram initializes. Once that is done the CPU can start. In most cores the CPUs reset doesn't wait for the sdram controller. Instead the reset is simply applied a few milliseconds while the SDRAM init only needs a few microseconds. This makes sure the sdram is fully operational before the cpu starts. The minimig core is one of the few where the sdram controller actually generates a reset signal for the cpu so the sdram is ready when the cpu starts.

This all isn't a hack. This is how it's supposed to be done.

The clock shift is something else. The signals driving the sdram need a few nansoconds to leave the FPGA and to reach the SDRAM chip. All signals are synchronous to the clock (that's what the S in SDRAM stands for). So the clock signal basically sais "dear sdram chip, when this clock rises please look at all the other signals". But that means that the FPGA must have setup all those other signals beforehand. So there are two clocks. One used by the FPGA internally to generate "all those signals" and the one going to the SDRAM telling it to have a look at "all those signals". And to make sure that "all those signals" have sufficient time to leave the FPGA and to reach the SDRAM the two clockes are slightly phase shifted. So the FPGA had enough time to setup "all those signals" and the signals had enough time to reach the SDRAM before the SDRAMs clock tells it to have a look at the signals.

This is also fine. At lower clocks this is more relaxed and the SDRAM may still have enough time to evaluate the signals before the FPGA changes them again. But if things get close to the SDRAMs 133MHz limit then every nanosecond counts and some fine tuning by shifting e.g. 2,5ns is needed.

This also isn't a hack but needed to fine tune the overall timing to a few nanoseconds.

harbaum commented 7 years ago

But i fully agree that some more complete timing constraints are really needed. And i'd be happy about any contribution here as my experiences with the timing constraints are actually quite limited.

renaudhelias commented 7 years ago

reset from ARM is status[0] (status coming from user_io.v) you can mix it with pll_locked in order to have a coolest processor reset signal. reset from ARM does wait in order to fill RAM with ROM before realize, so at end of it you can end reset of processor...

This also isn't a hack but needed to fine tune the overall timing to a few nanoseconds.

*a patch

sorgelig commented 7 years ago

I didn't look in ST core, but first thing you need to be sure: the whole project is synchronous. It means, everything synchronized by one global clock (well, except those external clocks like SPI). If you use always(posedge/negedge some_my_signal) then it may break the whole project functionality. Even single "always" may break a lot. You may have many different strange side behaviors if something is asynchronous in the project. When i was working on my ZX core and it was asynchronous it was very hard to add any new module and i couldn't reach CPU speed more than 7MHz. When i converted the core to synchronous style, i could easily reach theoretical maximum of CPU speed - 56MHz. And i could easily add more as many new modules as i want. It looked like a magic.

harbaum commented 7 years ago

The TG68K CPU in the Atari St core runs pretty stable at 32MHz. This is actually not bad and i haven't seen it running faster anywhere else.

sorgelig commented 7 years ago

May be in Apollo accelerator board?

jotego commented 7 years ago

I take the term "synchronous" as Sorgelig uses it as having only one clock domain in the design. And I agree that all hell breaks loose on a design with multiple clock domains if the time constraints are not set correctly and the inter-domain transfers have not been dealt with in the RTL. But, when things are well done in the RTL and the SDC, then a design with multiple clock domains can actually be faster than a single clock counterpart.

As my only experience with digital design is in ASICs, I can assure that a design with time issues is not taped out (sent to the foundry). In fact, when I started dealing with FPGA's it was a shock to me to see that the tools produce an output file regardless of the STA (static time analysis) results. It seems to make sense to be able to make some quick and dirty tests but, upon release, STA must be met. If STA fails it means that some devices, upon some conditions will fail. And you have no control about when and where they will fail.

I spent some time adding more detailed constraints yesterday and finally got one core that worked well with my FPGA. It still had quite a long list of timing violations but it was a bit better than the older ones. Then this morning, I turned on MIST and found the attached screen. The timing violations around the video module were showing up again. I reset MIST and worked and it has been working during the rest of the day. Will it fail again? Definetely.

It is difficult to me to add constraints to a design that I do not understand. Till Harbaum has shared some architectural aspects that are critical to writing the SDC. On top of that, things like false paths or multicycle paths can only be written with an understanding of the design. And if worse comes to worst and RTL redesign is needed, then understanding of the architecture is critical.

FPGA vendors recommend design approaches like incremental compilation: have a set of the design done and implemented perfectly and then keep it fixed in the FPGA as new modules are added using the rest of the space. Ideally, the MIST core should be clean on its own so new functionality can be added without having to worry about issues in the previous system.

img_20161001_074105

ghost commented 7 years ago

FWIW i got the ao68000 core up to 80Mhz on my digilent Atlys. However the performance isnt great because its a microcoded core with no pipelines.

S.

On Sat, Oct 1, 2016 at 10:02 PM, jotego notifications@github.com wrote:

I take the term "synchronous" as Sorgelig uses it as having only one clock domain in the design. And I agree that all hell breaks loose on a design with multiple clock domains if the time constraints are not set correctly and the inter-domain transfers have not been dealt with in the RTL. But, when things are well done in the RTL and the SDC, then a design with multiple clock domains can actually be faster than a single clock counterpart.

As my only experienced with digital design is in ASICs, I can assure that a design with time issues is not taped out (sent to the foundry). In fact, when I started dealing with FPGA's it was a shock to me to see that the tools produce an output file regardless of the STA (static time analysis) results. It seems to make sense to be able to make some quick and dirty tests but, upon release, STA must be met. If STA fails it means that some devices, upon some conditions will fail. And you have no control about when and where they will fail.

I spent some time adding more detailed constraints yesterday and finally got one core that worked well with my FPGA. It still had quite a long list of timing violations but it was a bit better than the older ones. Then this morning, I turned on MIST and found the attached screen. The timing violations around the video module were showing up again. I reset MIST and worked and it has been working during the rest of the day. Will it fail again? Definetely.

It is difficult to me to add constraints to a design that I do not understand. Till Harbaum has shared some architectural aspects that are critical to writing the SDC. On top of that, things like false paths or multicycle paths can only be written with an understanding of the design. And if worse comes to worst and RTL redesign is needed, then understanding of the architecture is critical.

FPGA vendors recommend design approaches like incremental compilation: have a set of the design done and implemented perfectly and then keep it fixed in the FPGA as new modules are added using the rest of the space. Ideally, the MIST core should be clean on its own so new functionality can be added without having to worry about issues in the previous system.

[image: img_20161001_074105] https://cloud.githubusercontent.com/assets/1863036/19016946/4836bcf8-8829-11e6-86d5-9f9d970325d9.jpg

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/mist-devel/mist-board/issues/38#issuecomment-250937429, or mute the thread https://github.com/notifications/unsubscribe-auth/ABPNmTbx1fwaqBPEef32EsCRJWnAmJC9ks5qvspigaJpZM4KKfbp .

Stephen Leary

jotego commented 7 years ago

The timing problems are not in the microprocessor but in the peripherals.

By the way, the ao68000 uses half the logic gates than the TG68K core. It probably compiles faster too because of the microcode.

jotego commented 7 years ago

By the way, what is the frequency of the input port SPI_SCK?

jotego commented 7 years ago

Sorry for the third comment in a row. I found an example of an asynchronous module: mfp_srff16. These are latches with edge set and reset signals. TimeQuest considers the driving signals as clocks and implementation gets messy. I have checked these signals and they are actually generated using an 8MHz clock. I think mfp_srff16 should be synchronous: i.e. regular D-type flip flops using the same 8MHz clock.

If I do that change, the isr_latch signals will get delayed by one 8MHz clock cycle, in comparison with current design. Is that ok? (I am going to try...)

harbaum commented 7 years ago

This part of the mfp was redesigned several times to match the behaviour of the real mfp to allow for e.g. flawless midi playback using cubase.

I'd strongly suggest to not do changes that will change the behaviour. And of course such changes will require extensive testing. The st will boot with a pretty broken mfp. But you'd see all kinds of strange and hard to debug issues in games and demos. These particular parts control the interrupt behaviour and the symptoms will be stack overflow and irq priority problems and the like. E.g. cubase may crash while you move the mouse. I spent plenty of time finding and fixing mfp problems. You can see that from the commits.

Have a look at early versions. The mfp once was synchronous but didn't work satisfyingly.

harbaum commented 7 years ago

Spi clock is up to 24 mhz

ghost commented 7 years ago

I agree with this. The MFP was why I gave up on my ST core and let Till pick at the bones of what I'd got done

Sent from my iPhone

On 2 Oct 2016, at 08:04, Till Harbaum notifications@github.com wrote:

This part of the mfp was redesigned several times to match the behaviour of the real mfp to allow for e.g. flawless midi playback using cubase.

I'd strongly suggest to not do changes that will change the behaviour. And of course such changes will require extensive testing. The st will boot with a pretty broken mfp. But you'd see all kinds of strange and hard to debug issues in games and demos. These particular parts control the interrupt behaviour and the symptoms will be stack overflow and it's priority problems and the like. E.g. cubase may crash while you move the mouse.

Have a look at early versions. The mfp once was synchronous but didn't work satisfyingly.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

jotego commented 7 years ago

Thanks! I will constraint each set/reset pin individually then to 8MHz without touching the RTL.

harbaum commented 7 years ago

Just to make it clear: I have nothing against experiments with other CPU cores and synchronous designs etc. Actually whatever you work on is great! But i'd prefer such extensive changes to happen in separate branches as the chances are that you are not 100% satisfied with the results.

ghost commented 7 years ago

I would agree with this approach.

Sent from my iPhone

On 4 Oct. 2016, at 19:42, Till Harbaum notifications@github.com wrote:

Just to make it clear: I have nothing against experiments with other CPU cores and synchronous designs etc. Actually whatever you work on is great! But i'd prefer such extensive changes to happen in separate branches as the chances are that you are not 100% satisfied with the results.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

jotego commented 7 years ago

Yes, that makes sense. If the RTL is modified it needs testing. Notice however that the current system is not stable either because of timing.

I spent more time on this on Sunday. There are too many signals and blocks that I don't know enough of to progress quickly. To be honest, this is quite a diversion from my objective which was to add FM sound to the ST (see a preview here). So I have decided to put this aside for a better time.

I found today that the version of Quartus we had at work has full license and I can make partitions with it, which would help to do a time closure. However, I do not expect to spend time on this at least in a few months.

If in the mean time someone does a proper time closure on this, I will be happy to add FM with MIDI to the ST, so we can have MIDI sound without external devices.

sebdel commented 7 years ago

Hey, I just wanted to say that as someone that don't really understand these issues, I'm following this conversation with great interest. If you ever try to fix it could you push it to a branch even if it's not working ? I expect to learn a lot from this patch :)

jotego commented 7 years ago

The clock shift to the SDRAM is indeed a hack. You can imagine that 2.5ns of delay will only work with some FPGAs and with some SDRAMs under some voltage/temperature conditions. The correct way of doing this is using self-synchronization techniques. You shouldn't need to write a specific time delay in the RTL whatsoever.

Here is a good, if lengthy, application note from Altera that goes through this. The general idea is that the circuit measures the time delay to the external device and uses it to compensate the internal clock. If that document is too long, there is a nice diagram on this other one. Check out figure 40 (page 47). You can see there how the clock is fed back into the DCM (PLL) to compensate for the network delay.

rkrajnc commented 7 years ago

The clock shift is a pretty standard way of handling SDRAM memories. I guess you can call it a hack, but there's no other simple way of handling this (other than significantly lowering SDRAM clock). Taking into account relatively slow SDRAM clocks (compared to DDR), it should be OK, even taking into account FPGA and SDRAM tolerances in 'normal' temperature conditions. It is mostly affected by PCB layout than anything else really.

I only skimmed over the Altera doc you linked, but that seems to deal only with DDR memories, which pose a whole different set of problems from 'normal' SDRAM memories, with its relatively slow clock frequency and high I/O voltages. The DDR interfaces, on the other hand, require proper balancing of clock delays, and that is indeed usually handled in an automatic way.

The other document - the Xilinx one - only mentions DCM clock compensation, which I believe is internal logic delay only, it doesn't take into account external delays. Besides, the Xilinx devices have DCM (sort of a DLL, with added stuff), which is a different thing than Altera's PLLs, so I don't think that document is applicable here.

There is one document that describes how to handle SDRAM memories with Altera FPGAs: https://www.altera.com.cn/zh_CN/pdfs/literature/hb/nios2/n2cpu_nii51005.pdf

While I don't think it is necessary to worry about it (more problems would be solved by switching to a 4-layer PCB, than worrying about SDRAM clock delay), if you do have any more info, or any simple way to do auto-delay with SDRAM, I'd be interested to hear about it.

jotego commented 7 years ago

Actually... Now that you mention it, it does seem to be pretty common. Up to the point of being in an appnote! It is shocking to me as an ASIC designer!

harbaum commented 7 years ago

I am fully aware that the MIST contains a lot of compromises, small hacks and imperfections. But without that it would have never seen the light of day. And i understand that it can be frustrating to face some of its shortcomings. The main design goal was "fun" which imho was achieved. Now it may be time to focus a little more on perfection, indeed. Hopefully that will not stop you from contributing ...

Did you take a look at other projects? E.g. the suska or the firebee may be what you are looking for.

renaudhelias commented 7 years ago

phase shift is in degree (modulo 360), here phase shift value is -2500, that's why I called it a hack : it does insert a boot delay. But it is a nice hack (no problem with Time Constraint here) The problem with Time Constraint here is arround RTL building the 8MHz and perhaps also the "negedge pll_locked" (pll_locked is not a clock, it's a reset signal)

Found a lot of talks around that (clock/reset) : http://electronics.stackexchange.com/questions/163018/asynchronous-reset-in-verilog ...

In vhdl I do use reset like this (asynchronous reset (it is not a clock)) :

ctrcConfig_process:process(reset,nCLK4_1) is
if reset='1' then
    Dout<=(others=>'1');
elsif rising_edge(nCLK4_1) then

But it can be also (synchronous reset) :

ctrcConfig_process:process(nCLK4_1) is
if rising_edge(nCLK4_1) then
if reset='1' then
else

http://github.com/renaudhelias/CoreAmstrad/blob/master/BuildYourOwnZ80Computer/simple_GateArrayInterrupt.vhd

In http://github.com/renaudhelias/CoreAmstrad/blob/master/BuildYourOwnZ80Computer/zsdram.v I play with a captured clock (clkref_i) : a clock not being a clock. In zsdram.v I did a lot of effort, perhaps you can merge it if you want.

harbaum commented 7 years ago

phase shift is in degree (modulo 360), here phase shift value is -2500, that's why I called it a hack : it does insert a boot delay.

Quartus allows the phase shift to be specified as an absolute time (in ns or ps) or as an angle. The -2500 are picoseconds.

And that's not a boot delay. You wouldn't notice a 2.5 nanoseconds boot delay. Instead it's the delay between the edges of the two clocks,

jotego commented 7 years ago

I am fully aware that the MIST contains a lot of compromises, small hacks and imperfections. But without that it would have never seen the light of day. And i understand that it can be frustrating to face some of its shortcomings. The main design goal was "fun" which imho was achieved. Now it may be time to focus a little more on perfection, indeed. Hopefully that will not stop you from contributing ...

I like the word perfection. I will continue working on sound chip cores for a while before I return to trying to add a full set of functions to an existing core.

gyurco commented 5 years ago

Hi guys, I have a side-project for some time, where I'm trying to make this core synchronous. I've finished with DMA (was a though one), shifter, and improved FMax with some tweaks in some not timing sensitive code (like OSD, etc...) I didn't even dare to touch MFP yet. There's a problem sometimes right after booting that the low and mid-resolution is garbled, but I don't think it was because of code changes, since it happened right after I've enabled time-based synthesis. Here's the current state: https://github.com/gyurco/mist-board/commits/mist-experiments My final goal is to replace the CPU to FX68K (or make it optional at least).

jotego commented 5 years ago

Hi Gyurco,

I like it so much that you are doing this! Let me share a couple of things I've learnt:

You can simulate your design using the memory model from the manufacturer. There is a copy of that file in my jt_gng repository. When you do that you can see how phase shifting affects it and the right value for CL in the SDRAM. Notice that I have a positive value for the SDRAM CLK time shift in my cores. I have no timing errors.
The 1943 core in MiSTer has severe timing errors in all corners. It should not work at all. If it does, is because there is a lucky combination of clock delay inside the FPGA, a negative time shift for the SDRAM clock and a CL value that happens to work. I want to fix it at some stage but I do not have so much control over the MiSTer framework.
A few weeks ago I made a number of changes to my 1942 core very quickly without testing them on the FPGA. When I finally tested them I found that it didn't work. Trying to fix the problem drove me to making other changes but it still didn't work. I fell into a loop and spent 6 days trying to fix it. Eventually, I had to go back to a secure place in the repo and apply changes one step at a time while trying it. Since then, I commit small steps when I am about to make a big change. Maybe that advice can be helpful to you too.

gyurco commented 5 years ago

Hi Jotego, thansk for sharing your thoughts.

For the SDRAM, I guess I'm using the TimeQuest report for the correct shift. Interesting if you're using the dedicated PLL output (c0 or c3 of PLL1), the clock is delayed much less. I had a problem with the mt48lc16m16a2.v module for simulations, it uses the # notation for delays, which Verilator doesn't support. Maybe that was the cause why the Archie had bad timings in its SDRAM controller, the simulation returned the good data with the bad CL setting.
For the MiSTer SDRAM, I think you cannot know the real delay, because the usage of the GPIO pins are adding an unknown factor, so the SDRAM datasheet delay values are probably not usable in the SDC.
Yepp, I also like small atomic commits :) git bisect is a great tool to find regressions.

I've progressed a bit more, almost all generated clocks are eliminated, except in MFP. Now I'll try to change the CPU to the cycle-exact one.

gyurco commented 5 years ago

Connected the FX68k CPU, it starts, but hangs after a while at 41fffe. In the code, it's some kind of IO for hbi, would be good to have some info what's done there.

harbaum commented 5 years ago

41fffe sounds like the first time the ramtest hits a bus error on a 4mb machine.

My guess: bus error does not work and the cpu waits forever fir dtack.

gyurco commented 5 years ago

Yes, examined that bus error is generated, but not cleared. The tg68 has a bus error clear output, but the original CPU hasn't. How the GLUE supposed to clear BERR? Or it supposed to jump to the bus error handler first? Seems Genesis knowledge is not enough here :)

gyurco commented 5 years ago

Seems clearing BERR after some cycles allows to continue. Now I guess I have memory write problems. Probably because of the original CPU signals are timed a differently than expected (e.g. during write, UDS and LDS are delayed by one cycle to AS and RW).

harbaum commented 5 years ago

I think you would apply BERR in parallel with DTACK just like any other CPU input signal. For the tg68k i had to generate a latched signal and this there's a seperate clear signal. With the fx68 this should imho all not be needed.

The tg68k was designed for the Amiga which does not use BERR at all.

harbaum commented 5 years ago

BTW: If the BERR is not deasserted fast enough the CPU would detect a double BERR and would go into HALT state.

gyurco commented 5 years ago

As I read in the 68000 manual, it's not critical how much time the BERR remains asserted (Figure 5-25). "As long as BERR remains asserted, the data and address buses are in the high-impedance state."

Upd: now it's in an endless loop in TOS, around E013f6

harbaum commented 5 years ago

But you need to do something to get out of the berr state. Isn't the CPU still waiting for the dtack?

I can have a look what that loop waits for. I remember such issues when writing the core in the beginning.

gyurco commented 5 years ago

No, BERR seems to be OK, it's implemented as the 68000 manual says now (clear after some cycles). I strongly believe writes to the SDRAM are the problem now.

harbaum commented 5 years ago

But writes are generally working as e.g. the ram test passes.

gyurco commented 5 years ago

Forgot that TG68k is clocked in 2MHz to have an effect of a 8MHy 68000. Clocking FX68 at 8MHz and make sure that the established bus cycles are OK is a challenge now. Maybe if dtack_n generation remains the same, then the CPU will align itself to the existing cycles. Upd: seems now it's going further, the clock is corrected. Now loops at e02438. Writing constantly to fffa22 (MFP?). I wonder if the existing MFP is compatible. So is there a quick memtest before the Atari logo?

mist-devel / mist-board

MIST core: Timing constraints not met #38

TimeQuest Timing Analyzer Summary