paulscherrerinstitute / vivadoIP_mem_test

Memory tester for memories connected via AXI (e.g. external DDR memories)
Other
6 stars 4 forks source link

Timing Problems at 300 MHz #1

Closed Paebbels closed 3 years ago

Paebbels commented 4 years ago

Use Case:
I would like to test a Xilinx MPSoC PS-DDR4 memory controller and the underlying PCB design.

While simple software-based memory tests executed from within the MPSoCs OCRAM show no errors at DDR4-2400, our application has problems to be stable at DDR4-2400. Nonetheless, running our application at DDR4-1600 works fine. That's why I now try to test the memory controller / PCB with the mem_test IP core from PSI.

Implementation:
I instantiated this IP core 4 times with an 128-bit AXI interface @300MHz and connected this to the MPSoCs high-performance slave ports.

Timing Reports:
After implementation I see timing errors in the mem_test instance(s).
Here is one example path:
image

Needed optimizations:

Questions:

obruendl commented 4 years ago

Timing strongly depends on placement and routing, so there is no such thing as an FMAX for a specific device type,.

I see three possible solutions:

  1. check if the critical path really contains to many logic levels or if it is a placement problem. If it is a placement problem, consider hand placement.
  2. If the path contains too many logic levels, add more pipelining. You may also request this from Jonas Purtschert who is maintaining the core currently.
  3. Run the core at half the frequency and twice the interface width (256 bit). Use an AXI interconnect to change width/frequency as required to 128 bit wide PS ports.
obruendl commented 4 years ago

BTW: Are you using only this IP Core or is the PSI library used at PLC2 in general? Just for personal interest :-)

jonasppsi commented 4 years ago

Hi Patrick

Thanks for your comment! I currently run the memory tester at 300 MHz but only 64 Bit AXI. (Zynq Ultrascale+) I have a look at it.

Paebbels commented 4 years ago

Hello Oliver,

here is design for a Zynq MPSoC ZU11EG: image

The design is almost empty. I know that the Zynq UltraScale+ architecture is bad for high-performance master ports, so I added AXI Data FIFOs in between your mem_test cores and the Zynq input. See the block design screenshot above. The power_sink is currently only a very small dummy.

Yes, I could use a DataWidthConverter, but I fear it doesn't bring all the power to the road... I wouldn't suggest an interconnect, because this involves a 1-to-1 crossbar, which is not needed in this cases.


Can you forward this question also to Jonas Purtschert?

jonasppsi commented 4 years ago

@Paebbels, I've observed similar timing issues in my design when using 128bit AXI @300MHz.

I tried to relax the complained pathes by pipelining. Pushed in the branch #improveTiming, 9dea902 In my design it solves the timing issues. Can you try if it helps?

obruendl commented 4 years ago

@jonasppsi Thank you for resolving the issue

@Paebbels Is the issue resolved at your end?

Paebbels commented 4 years ago

@obruendl I need some more time for testing. The design with mem_test and power_sink is quite big. I use this test design to improve our CI scripts to speed up compilation on GitLab runners.

More over, I need to change the submodule structure of my Git, because I want to avoid cloning PSI cores on every CI run. Thus I mirror PSI cores to our company internal GitLab instance.

To summarize, I need ~1 more week to get results for you.


I'll present this setup and how to speed up compilation with CI and massiv parallel synthesis at FPGA Conference Europe 2020

Paebbels commented 4 years ago

I did further experiments. To get the MemoryTester implemented I needed these steps on a Xilinx ZU11 MPSoC:

jonasppsi commented 4 years ago

@Paebbels Thanks for sharing your additional constraints!

At the moment I don't know If I can improve the timing in the memory tester further... If you have suggestions let me know, otherwise I will close this issue if you can handle the timings with the constraints?