openXC7 / nextpnr-xilinx

Experimental flows using nextpnr for Xilinx devices
ISC License
36 stars 10 forks source link

Limited support for Distributed Memory / LUTRAM #20

Open hansemro opened 8 months ago

hansemro commented 8 months ago

Issue Description

memory_libmap pass in Yosys 0.18 and newer would synthesize LUTRAMs unsupported by nextpnr including:

Part of the issue stems from nextpnr not fully supporting all LUTRAMs in the Distributed RAM packer in xilinx/pack_dram.cc.

Resolving this should also address https://github.com/openXC7/demo-projects/issues/6.

Tasks/Status

TODO: rewrite tasks

Development Branches

References

See 018-clb-ram minitest. Build and view design checkpoint in Vivado.

https://f4pga.readthedocs.io/projects/prjxray/en/latest/architecture/dram_configuration.html

https://docs.xilinx.com/v/u/en-US/ug474_7Series_CLB

https://docs.xilinx.com/v/u/en-US/ug574-ultrascale-clb

https://docs.xilinx.com/r/en-US/ug953-vivado-7series-libraries

https://docs.xilinx.com/r/en-US/ug974-vivado-ultrascale-libraries

https://www.xilinx.com/content/dam/xilinx/support/documents/sw_manuals/xilinx14_7/7series_hdl.pdf

https://docs.amd.com/v/u/en-US/7series_hdl

https://github.com/Xilinx/XilinxUnisimLibrary/tree/master/verilog/src/unisims

hansfbaier commented 8 months ago

Very good, thanks!

hansemro commented 8 months ago

Added support for RAM128X1S and RAM256X1S though not sufficiently tested. I was able to build litex-ddr-kc705 after latest changes. However, I am now running into DDR memtest issues it seems: https://gist.github.com/hansemro/5f48f4098e59f9db2e34ae25cb0b6ecd

hansfbaier commented 8 months ago

@hansemro Wow, that was quick! We might want to write some basic tests here: https://github.com/openXC7/primitive-tests Debugging the issue inside that complex design is probably too cumbersome.

hansemro commented 8 months ago

@hansfbaier Sounds good. I'll try to write some tests soon. DDR issue could be totally unrelated to this.

hansfbaier commented 8 months ago

@hansemro Yes, very likely your code works. I never got 8 modules working nice with OpenXC7. At some point the timing just falls apart, because of congestion, I suppose. See my comment on your gist.

hansemro commented 8 months ago

Rebased experimental branch on https://github.com/openXC7/nextpnr-xilinx/commit/8120acd7c036cd87af2f1169e9c555546ad7bded (current stable-backports)

hansfbaier commented 8 months ago

@hansemro great, thanks

hansemro commented 8 months ago

Fixed up RAM32X1D not creating RAMD32 instances (was creating RAMD64 instances previously) and made sure RAM32X1S is handled in pack_dram (forgot this one apparently).

hansemro commented 8 months ago

Made initial tests: https://github.com/hansemro/primitive-tests/commits/lutram-tests/

Notably nextpnr-xilinx hangs with RAM256X1S. Seems #10 made this observation months ago. I'll try to spend some time debugging this.

hansemro commented 8 months ago

RAM256X1S and RAM256X1D were not handled correctly since their address pins are in an array A[N:0] rather than specified individually (A0, A1, A2, ... ). I'll need to validate every transform rule anyway...

hansfbaier commented 8 months ago

@hansemro I pushed an MMCM-fix to stable-backports. The MMCM should work now, if you have time to try.

hansemro commented 8 months ago

@hansfbaier Thanks for the heads up. I will get to it at some point. I decided it is better for me to fix dual-port LUTRAMs before moving forward, because everything is broken (for at least xc7, not sure about ultrascale).

~For example, while tracing how RAM128X1D is handled, twice as many RAMD64Es are created with z/height decremented for each one. We end up with negative z?! Not sure what the intention was but this does not seem right to me.~

hansfbaier commented 8 months ago

Yes definitely, that is more important. Thanks for working on this.

hansemro commented 8 months ago

I misspoke since I was testing RAM256X1D (thought I was testing RAM256X1S) which does not fit in xc7 anyhow. Still, yosys should not allow unsupported LUTRAMs to be synthesized!

hansemro commented 8 months ago

Resolved two issues:

hansfbaier commented 7 months ago

@hansemro How are things going? Have you been able to test the changes?

hansemro commented 7 months ago

@hansfbaier

Have you been able to test the changes?

MMCM is confirmed working with multiple clock outputs on KC705 though with a BUFG on all clock outputs. fasm2frames would throw segment DB errors if I didn't have them. Test branch: https://github.com/hansemro/primitive-tests/commits/mmcm-blinky-kc705-db-error/

Interestingly, the first clock output didn't require me to place BUFG buffer, though it should probably have one.

How are things going?

Things were going well until I had to handle each LUTRAM as an edge case. Initially, I was less bothered to write things down, but now I feel it is appropriate to actually spend time documenting the port/parameter transformations for all cells (including ultrascale-only cells). I intend to resume work on validation and get xc7 cells covered.

Anyhow, it turns out I made some incorrect assumptions about some things. Here are some TODOs:

  1. create_dram32_lut should be able to map LUT5 or LUT6 BEL site, but currently doesn't
  2. check whether address ports are connected to both A{6:1} and WA{8:1} (WA{9:1} for ultrascale) ports in LUT6/LUT5 BEL for single-port LUTRAM

Will elaborate further with a follow-up post on LUTRAM transformations to RAMS/RAMD primitive cells to LUT5/LUT6 BELs in more detail.

hansfbaier commented 7 months ago

Yes, I had similar observations. CLKOUT 1-3 had missing pips. CLKOUT 0,5,6 worked fine out of the box: https://github.com/openXC7/primitive-tests/blob/main/mmcm-blinky-kintex/blinky.v But only on Kintex. On other series all CLKOUT ports were fine.

hansfbaier commented 7 months ago

Thanks for the update!

hansemro commented 7 months ago

Naming Notation:

I'll define the following name notation to fold port/parameter names:

LUTRAM Cell Table:

Additional notes:

Cell Cell Type US-only *INIT* Parameter *CLK_INVERTED Parameter Clock Input Write Enable Input Write Data Input Write Select Input Write Address Input Read Address Input Read Data Output
LUT_OR_MEM5 BEL false INIT[31:0] N/A CLK WE DI1 N/A WA{5:1} A{5:1} O5
LUT_OR_MEM6 BEL false INIT[63:0] N/A CLK WE DI2 N/A US ? WA{9:1} : WA{8:1} A{6:1} O6
RAMS32 LUTRAM Primitive false INIT[31:0] IS_CLK_INVERTED CLK WE I N/A ADR{4:0} ADR{4:0} O
RAMD32 LUTRAM Primitive false INIT[31:0] IS_CLK_INVERTED CLK WE I N/A WADR{4:0} RADR{4:0} O
RAMD32M64 LUTRAM Primitive true INIT[63:0] IS_CLK_INVERTED CLK WE I N/A WADR{4:0} RADR{5:0} O
RAM32X1S LUTRAM false INIT[31:0] IS_WCLK_INVERTED WCLK WE D N/A A{4:0} A{4:0} O
RAM32X1D LUTRAM false INIT[31:0] IS_WCLK_INVERTED WCLK WE D N/A A{4:0} A{4:0}; DPRA{4:0} SPO; DPO
RAM32X16DR8 Asymmetric LUTRAM true N/A? IS_WCLK_INVERTED WCLK WE DI{H:A}[1:0] N/A ADDRH[4:0];ADDR{G:A}[5:0] ADDRH[4:0];ADDR{G:A}[5:0] DOH[1:0]; DO{G:A}
RAM32M SelectRAM false INIT_{D:A}[63:0] IS_WCLK_INVERTED WCLK WE DI{D:A}[1:0] N/A ADDR{D:A}[4:0] ADDR{D:A}[4:0] DO{D:A}[1:0]
RAM32M16 SelectRAM true INIT_{H:A}[63:0] IS_WCLK_INVERTED WCLK WE DI{H:A}[1:0] N/A ADDR{H:A}[4:0] ADDR{H:A}[4:0] DO{H:A}[1:0]
RAMS64E LUTRAM Primitive false INIT[63:0] IS_CLK_INVERTED CLK WE I N/A (WADR{7:6}), ADR{5:0} ADR{5:0} O
RAMS64E1 LUTRAM Primitive true? INIT[63:0] IS_CLK_INVERTED CLK WE I N/A (WADR{8:6}), ADR{5:0} ADR{5:0} O
RAMD64E LUTRAM Primitive false INIT[63:0] IS_CLK_INVERTED CLK WE I N/A WADR{7:0} RADR{5:0} O
RAM64X1S LUTRAM false INIT[63:0] IS_WCLK_INVERTED WCLK WE D N/A A{5:0} A{5:0} O
RAM64X1D LUTRAM false INIT[63:0] IS_WCLK_INVERTED WCLK WE D N/A A{5:0} A{5:0}; DPRA{5:0} SPO; DPO
RAM64X8SW SelectRAM true INIT_{H:A}[63:0] IS_WCLK_INVERTED WCLK WE D WSEL[2:0] A[5:0] A[5:0] O[7:0]
RAM64M SelectRAM false INIT_{D:A}[63:0] IS_WCLK_INVERTED WCLK WE DI{D:A} N/A ADDR{D:A}[5:0] ADDR{D:A}[5:0] DO{D:A}
RAM64M8 SelectRAM true INIT_{H:A}[63:0] IS_WCLK_INVERTED WCLK WE DI{H:A} N/A ADDR{H:A}[5:0] ADDR{H:A}[5:0] DO{H:A}
RAM128X1S LUTRAM false INIT[127:0] IS_WCLK_INVERTED WCLK WE D N/A A{6:0} A{6:0} O
RAM128X1D LUTRAM false INIT[127:0] IS_WCLK_INVERTED WCLK WE D N/A A[6:0] A[6:0]; DPRA[6:0] SPO; DPO
RAM256X1S LUTRAM false INIT[255:0] IS_WCLK_INVERTED WCLK WE D N/A A[7:0] A[7:0] O
RAM256X1D LUTRAM true INIT[255:0] IS_WCLK_INVERTED WCLK WE D N/A A[7:0] A[7:0]; DPRA[7:0] SPO; DPO
RAM512X1S LUTRAM true INIT[511:0] IS_WCLK_INVERTED WCLK WE D N/A A[8:0] A[8:0] O
hansemro commented 7 months ago

XC7 LUTRAM to LUTRAM Primitive Transformations:

LUTRAMs are broken down to primitive cell(s) that will eventually map to SLICEM LUT_OR_MEM6/LUT_OR_MEM5 BEL site(s) once placed.

Convention:

RAM32X1S -> 1x RAMS32

Cell Rules:

Parameter Rules:

Port Rules:

RAM32X1D -> 2x RAMD32

Cell Rules:

Parameter Rules:

Port Rules:

RAM32M -> 2x RAMS32 + 6x RAMD32

Cell Rules:

Parameter Rules:

Port Rules:

RAM64X1S -> RAMS64E

Cell Rules:

Parameter Rules:

Port Rules:

RAM64X1D -> 2x RAMD64E

Cell Rules:

Parameter Rules:

Port Rules:

RAM64M -> 4x RAMD64E

Cell Rules:

Parameter Rules:

Port Rules:

RAM128X1S -> 2x RAMS64E

Cell Rules:

Parameter Rules:

Port Rules:

Cell Rules:

Parameter Rules:

Port Rules:

RAM256X1S -> 4x RAMS64E

Cell Rules:

Parameter Rules:

Port Rules:

hansemro commented 7 months ago

LUTRAM Primitive to BEL Transformations:

Additional notes:

RAMD64E -> LUT_OR_MEM6 BEL

RAMS64E -> LUT_OR_MEM6 BEL

RAMD32 -> LUT_OR_MEM6 BEL

RAMD32 -> LUT_OR_MEM5 BEL

RAMS32 -> LUT_OR_MEM6 BEL

RAMS32 -> LUT_OR_MEM5 BEL

hansfbaier commented 7 months ago

Thanks for the effort! I am looking forward to what you will come up with!

hansemro commented 6 months ago

Issue: nextpnr checks INIT{A:D} instead of INIT_{A:D} parameters for RAM32M/RAM64M

While working on an INIT parameter test, I noticed that the INIT parameters for RAM32M/RAM64M were not being set with correct values in the FASM result.

https://github.com/openXC7/nextpnr-xilinx/blob/1c57f511f80945a709d1d43478d39ad0b6cd51d2/xilinx/pack_dram.cc#L455-L469

Merely adding underscores does not immediately solve the issue, so I will need to look more into this later.

WIP INIT Parameter Test: https://github.com/hansemro/primitive-tests/tree/xc7-lutram-tests/lutram-tests/init-test

hansemro commented 6 months ago

Issue: nextpnr checks INIT{A:D} instead of INIT_{A:D} parameters for RAM32M/RAM64M

This should be fixed in this branch: https://github.com/hansemro/nextpnr-xilinx/tree/fix-ram32m-ram64m-init

However, I am noticing some discrepancies compared to Vivado that I will need to verify. Notice how the upper and lower 32-bits of the ?LUT.INIT pattern are swapped.

RAM32M NextPNR fasm result:

CLBLM_L_X84Y126.SLICEM_X0.ALUT.INIT[63:0] = 64'b0000000000000000111011100100010000000000000000000101000001010000
CLBLM_L_X84Y126.SLICEM_X0.ALUT.DI1MUX.AI
CLBLM_L_X84Y126.SLICEM_X0.ALUT.SMALL
CLBLM_L_X84Y126.SLICEM_X0.ALUT.RAM
CLBLM_L_X84Y126.SLICEM_X0.AOUTMUX.O5
CLBLM_L_X84Y126.SLICEM_X0.BLUT.INIT[63:0] = 64'b0000000000000000111011100100010000000000000000001111101011111010
CLBLM_L_X84Y126.SLICEM_X0.BLUT.DI1MUX.BI
CLBLM_L_X84Y126.SLICEM_X0.BLUT.SMALL
CLBLM_L_X84Y126.SLICEM_X0.BLUT.RAM
CLBLM_L_X84Y126.SLICEM_X0.BOUTMUX.O5
CLBLM_L_X84Y126.SLICEM_X0.CLUT.INIT[63:0] = 64'b0000000000000000000100011011101100000000000000001010111110101111
CLBLM_L_X84Y126.SLICEM_X0.CLUT.DI1MUX.CI
CLBLM_L_X84Y126.SLICEM_X0.CLUT.SMALL
CLBLM_L_X84Y126.SLICEM_X0.CLUT.RAM
CLBLM_L_X84Y126.SLICEM_X0.COUTMUX.O5
CLBLM_L_X84Y126.SLICEM_X0.DLUT.INIT[63:0] = 64'b0000000000000000000100011011101100000000000000000000010100000101
CLBLM_L_X84Y126.SLICEM_X0.DLUT.SMALL
CLBLM_L_X84Y126.SLICEM_X0.DLUT.RAM
CLBLM_L_X84Y126.SLICEM_X0.DOUTMUX.O5
CLBLM_L_X84Y126.SLICEM_X0.WEMUX.CE
CLBLM_L_X84Y126.SLICEM_X0.NOCLKINV
CLBLM_L_X84Y126.SLICEL_X1.NOCLKINV

RAM32M Vivado bit2fasm result:

CLBLM_L_X48Y95.SLICEL_X1.NOCLKINV
CLBLM_L_X48Y95.SLICEL_X1.PRECYINIT.C0
CLBLM_L_X48Y95.SLICEM_X0.ALUT.DI1MUX.BDI1_BMC31
CLBLM_L_X48Y95.SLICEM_X0.ALUT.INIT[46:0] = 47'b10100000101000000000000000000001110111001000100
CLBLM_L_X48Y95.SLICEM_X0.ALUT.RAM
CLBLM_L_X48Y95.SLICEM_X0.ALUT.SMALL
CLBLM_L_X48Y95.SLICEM_X0.AOUTMUX.O5
CLBLM_L_X48Y95.SLICEM_X0.BLUT.DI1MUX.DI_CMC31
CLBLM_L_X48Y95.SLICEM_X0.BLUT.INIT[47:0] = 48'b111110101111101000000000000000001110111001000100
CLBLM_L_X48Y95.SLICEM_X0.BLUT.RAM
CLBLM_L_X48Y95.SLICEM_X0.BLUT.SMALL
CLBLM_L_X48Y95.SLICEM_X0.BOUTMUX.O5
CLBLM_L_X48Y95.SLICEM_X0.CLUT.DI1MUX.DI_DMC31
CLBLM_L_X48Y95.SLICEM_X0.CLUT.INIT[47:0] = 48'b101011111010111100000000000000000001000110111011
CLBLM_L_X48Y95.SLICEM_X0.CLUT.RAM
CLBLM_L_X48Y95.SLICEM_X0.CLUT.SMALL
CLBLM_L_X48Y95.SLICEM_X0.COUTMUX.O5
CLBLM_L_X48Y95.SLICEM_X0.DLUT.INIT[42:0] = 43'b1010000010100000000000000000001000110111011
CLBLM_L_X48Y95.SLICEM_X0.DLUT.RAM
CLBLM_L_X48Y95.SLICEM_X0.DLUT.SMALL
CLBLM_L_X48Y95.SLICEM_X0.DOUTMUX.O5
CLBLM_L_X48Y95.SLICEM_X0.NOCLKINV
CLBLM_L_X48Y95.SLICEM_X0.PRECYINIT.C0
CLBLM_L_X48Y95.SLICEM_X0.WEMUX.CE
hansemro commented 6 months ago

Issue: nextpnr does nothing with IS_*CLK_INVERTED property for LUTRAM cells

While working on CLKINV property test, I noticed that nextpnr does not set the CLKINV bit when IS_*CLK_INVERTED property for a LUTRAM is set. Instead, nextpnr fasm writer ignores the property and sets NOCLKINV for the SLICEM site.

Also note that, on XC7, LUTRAMs and FFs share the same CLKINV routing BEL. However, on Ultrascale(+), LUTRAMs have their own dedicated clock inverter provided by the LCLKINV routing BEL.

WIP CLKINV property test: https://github.com/hansemro/primitive-tests/tree/xc7-lutram-tests/lutram-tests/clkinv-test

hansfbaier commented 6 months ago

Great work!

hansemro commented 6 days ago

Something that bothered me about how RAM{S,D}64E maps to LUT_OR_MEM6 BEL is that only DI1 data input is used to write to both internal LUT_OR_MEM5 BELs. Somehow one of the LUT_OR_MEM5 BEL can select between DI1 and DI2 data inputs, but this is not really well documented.

Recently, I stumbled on the following physical design rules (in $VIVADO_2017.2_ROOT/ids_lite/ISE/msg/usenglish/PhysDesignRules.msg) that seem to correspond to RAM.SMALL configuration bit being what controls data input selection:

1383
Issue with pin connections and/or configuration on block:<%s>:<%s>.  For RAMMODE programming set with DPRAM64 or SPRAM64 or SRL32 the DI1 input pin must be connected.\n
Issue with pin connections and/or configuration on block:<!%1!>:<!%2!>.  For RAMMODE programming set with DPRAM64 or SPRAM64 or SRL32 the DI1 input pin must be connected.\n
;;
1384
Issue with pin connections and/or configuration on block:<%s>:<%s>.  For RAMMODE programming set with DPRAM64 or SPRAM64 or SRL32 the DI2 input pin cannot be connected.\n
Issue with pin connections and/or configuration on block:<!%1!>:<!%2!>.  For RAMMODE programming set with DPRAM64 or SPRAM64 or SRL32 the DI2 input pin cannot be connected.\n
;;
1385
Issue with pin connections and/or configuration on block:<%s>:<%s>.  For RAMMODE programming set with DPRAM32 or SPRAM32 or SRL16 the DI2 input pin must be connected.\n
Issue with pin connections and/or configuration on block:<!%1!>:<!%2!>.  For RAMMODE programming set with DPRAM32 or SPRAM32 or SRL16 the DI2 input pin must be connected.\n
;;
1386
Issue with pin connections and/or configuration on block:<%s>:<%s>.  For RAMMODE programming set with DPRAM32 or SPRAM32 or SRL16 the DI1 input pin cannot be connected.\n
Issue with pin connections and/or configuration on block:<!%1!>:<!%2!>.  For RAMMODE programming set with DPRAM32 or SPRAM32 or SRL16 the DI1 input pin cannot be connected.\n

Coincidentally, 018-clb-ram prjxray fuzzer found that RAM.SMALL configuration bit is set for RAM32M/RAM32X1{S,D} and SRL16E, but not set for RAM64M/RAM{64,128}X{S,D}/RAM256X1S and SRLC32E. This all seems to indicate RAM.SMALL bit is used for data input selection for upper LUT_OR_MEM5 BEL (one with DI2 input and initialized with INIT[63:32]).

Here is a block diagram to help visualize what I see of LUT_OR_MEM6 BEL:

LUT_OR_MEM6 R1

hansfbaier commented 6 days ago

Good to see you making progress!

lehaifeng000 commented 4 days ago

@hansemro the DI2 can't be used, it would make the Infinite loop in place. there is some check in the place, if the lutram used the DI2 port, and the wa7-8 ports are not being config, It cannot complete the place

hansemro commented 4 days ago

@lehaifeng000 Yes, nextpnr does not currently have the capacity to pack/place RAM32X1S/RAMS32 which would occupy and utilize both LUT_OR_MEM5 BELs. As you say, placer will get stuck because LUT_OR_MEM5/RAM32* with DI2 is not yet accepted as legal: https://github.com/openXC7/nextpnr-xilinx/blob/9debb871624163d4043150e576793c78cba503f8/xilinx/arch_place.cc#L114-L117.

Additionally, how {C,B}X pins are connected to {C,B}LUTs and WA7USED/WA8USED BEL should also be considered in legalization. I believe, Vivado avoids this by making it illegal to place mismatched LUTRAM types in the same CLB SLICEM site. However, I will need to look more into this.

If we update the packer to utilize DI2, we will need to and should update the placer and legalization accordingly.

lehaifeng000 commented 4 days ago

I tend to understand and try to modify place, inserting rules into the placement process.

lehaifeng000 commented 3 days ago

@hansfbaier @hansemro I'm wondering if you could provide an email or other contact information. Sometimes, materials that aren't suitable for public sharing can be sent to you privately.

hansfbaier commented 3 days ago

@lehaifeng000 my email address foss@hans-baier.de should be visible in every git commit. Same for @hansemro

lehaifeng000 commented 3 days ago

@lehaifeng000 my email address foss@hans-baier.de should be visible in every git commit. Same for @hansemro OK, I got it