litex-hub / linux-on-litex-vexriscv

Linux on LiteX-VexRiscv
BSD 2-Clause "Simplified" License
586 stars 174 forks source link

vexriscv-smp : susspect critical path in peripherals #149

Closed Dolu1990 closed 4 years ago

Dolu1990 commented 4 years ago

Hi,

Got some critical path the the peripheral side of the SoC in the vexriscv-smp branch.

Seem like it goes from the sdcard controller -> some distributed ram -> some interconnect -> down to the litedram phy.

Such path shoud't be there. Any idea what could be this distributed ram just fater the sdcard controller ?

Max Delay Paths
--------------------------------------------------------------------------------------
Slack (MET) :             0.041ns  (required time - arrival time)
  Source:                 soclinux_sdblock2mem_wishbonedmawriter_base_storage_reg[2]/C
                            (rising edge-triggered cell FDRE clocked by crg_clkout0  {rise@0.000ns fall@5.000ns period=10.000ns})
  Destination:            OSERDESE2_24/D7
                            (rising edge-triggered cell OSERDESE2 clocked by crg_clkout0  {rise@0.000ns fall@5.000ns period=10.000ns})
  Path Group:             crg_clkout0
  Path Type:              Setup (Max at Slow Process Corner)
  Requirement:            10.000ns  (crg_clkout0 rise@10.000ns - crg_clkout0 rise@0.000ns)
  Data Path Delay:        9.333ns  (logic 2.897ns (31.040%)  route 6.436ns (68.960%))
  Logic Levels:           13  (CARRY4=5 LUT2=2 LUT4=3 LUT6=3)
  Clock Path Skew:        0.055ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    5.888ns = ( 15.888 - 10.000 ) 
    Source Clock Delay      (SCD):    6.144ns
    Clock Pessimism Removal (CPR):    0.311ns
  Clock Uncertainty:      0.057ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Discrete Jitter          (DJ):    0.089ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock crg_clkout0 rise edge)
                                                      0.000     0.000 r  
    E3                                                0.000     0.000 r  clk100 (IN)
                         net (fo=0)                   0.000     0.000    clk100
    E3                   IBUF (Prop_ibuf_I_O)         1.489     1.489 r  clk100_IBUF_inst/O
                         net (fo=1, routed)           1.253     2.742    crg_clkin
    PLLE2_ADV_X1Y1       PLLE2_ADV (Prop_plle2_adv_CLKIN1_CLKOUT0)
                                                      0.088     2.830 r  PLLE2_ADV/CLKOUT0
                         net (fo=1, routed)           1.655     4.485    crg_clkout0
    BUFGCTRL_X0Y16       BUFG (Prop_bufg_I_O)         0.096     4.581 r  BUFG/O
                         net (fo=9005, routed)        1.562     6.144    sys_clk
    SLICE_X46Y15         FDRE                                         r  soclinux_sdblock2mem_wishbonedmawriter_base_storage_reg[2]/C
  -------------------------------------------------------------------    -------------------
    SLICE_X46Y15         FDRE (Prop_fdre_C_Q)         0.518     6.662 r  soclinux_sdblock2mem_wishbonedmawriter_base_storage_reg[2]/Q
                         net (fo=3, routed)           0.593     7.254    soclinux_sdblock2mem_wishbonedmawriter_base[0]
    SLICE_X45Y16         LUT2 (Prop_lut2_I0_O)        0.124     7.378 r  tag_mem_reg_i_78/O
                         net (fo=1, routed)           0.000     7.378    tag_mem_reg_i_78_n_0
    SLICE_X45Y16         CARRY4 (Prop_carry4_S[0]_CO[3])
                                                      0.532     7.910 r  tag_mem_reg_i_41/CO[3]
                         net (fo=1, routed)           0.000     7.910    tag_mem_reg_i_41_n_0
    SLICE_X45Y17         CARRY4 (Prop_carry4_CI_CO[3])
                                                      0.114     8.024 r  tag_mem_reg_i_38/CO[3]
                         net (fo=1, routed)           0.000     8.024    tag_mem_reg_i_38_n_0
    SLICE_X45Y18         CARRY4 (Prop_carry4_CI_CO[3])
                                                      0.114     8.138 r  tag_mem_reg_i_36/CO[3]
                         net (fo=1, routed)           0.000     8.138    tag_mem_reg_i_36_n_0
    SLICE_X45Y19         CARRY4 (Prop_carry4_CI_CO[3])
                                                      0.114     8.252 r  tag_mem_reg_i_50/CO[3]
                         net (fo=1, routed)           0.000     8.252    tag_mem_reg_i_50_n_0
    SLICE_X45Y20         CARRY4 (Prop_carry4_CI_O[1])
                                                      0.334     8.586 r  tag_mem_reg_i_49/O[1]
                         net (fo=2, routed)           0.689     9.276    VexRiscvLitexSmpCluster_Cc1_Iw64Is8192Iy2_Dw64Ds8192Dy2_Ldw128/clint_logic/_zz_385_reg_1[1]
    SLICE_X46Y21         LUT6 (Prop_lut6_I1_O)        0.303     9.579 f  VexRiscvLitexSmpCluster_Cc1_Iw64Is8192Iy2_Dw64Ds8192Dy2_Ldw128/clint_logic/builder_slave_sel_r[6]_i_4/O
                         net (fo=7, routed)           0.900    10.479    VexRiscvLitexSmpCluster_Cc1_Iw64Is8192Iy2_Dw64Ds8192Dy2_Ldw128/clint_logic_n_112
    SLICE_X49Y20         LUT4 (Prop_lut4_I0_O)        0.124    10.603 r  VexRiscvLitexSmpCluster_Cc1_Iw64Is8192Iy2_Dw64Ds8192Dy2_Ldw128/builder_slave_sel_r[4]_i_3/O
                         net (fo=6, routed)           0.712    11.315    VexRiscvLitexSmpCluster_Cc1_Iw64Is8192Iy2_Dw64Ds8192Dy2_Ldw128/builder_slave_sel_r[4]_i_3_n_0
    SLICE_X50Y20         LUT6 (Prop_lut6_I3_O)        0.124    11.439 f  VexRiscvLitexSmpCluster_Cc1_Iw64Is8192Iy2_Dw64Ds8192Dy2_Ldw128/builder_csr_bankarray_sel_r_i_3/O
                         net (fo=45, routed)          0.782    12.221    VexRiscvLitexSmpCluster_Cc1_Iw64Is8192Iy2_Dw64Ds8192Dy2_Ldw128/builder_csr_bankarray_sel_r_i_3_n_0
    SLICE_X56Y18         LUT6 (Prop_lut6_I5_O)        0.124    12.345 r  VexRiscvLitexSmpCluster_Cc1_Iw64Is8192Iy2_Dw64Ds8192Dy2_Ldw128/builder_csr_bankarray_interface15_bank_bus_dat_r[7]_i_1/O
                         net (fo=11, routed)          0.739    13.084    VexRiscvLitexSmpCluster_Cc1_Iw64Is8192Iy2_Dw64Ds8192Dy2_Ldw128/peripheralBridge_bmb_arbiter_io_output_cmd_halfPipe_regs_payload_fragment_address_reg[14]_1
    SLICE_X62Y16         LUT2 (Prop_lut2_I1_O)        0.124    13.208 f  VexRiscvLitexSmpCluster_Cc1_Iw64Is8192Iy2_Dw64Ds8192Dy2_Ldw128/sdram_storage[3]_i_2/O
                         net (fo=39, routed)          0.568    13.776    VexRiscvLitexSmpCluster_Cc1_Iw64Is8192Iy2_Dw64Ds8192Dy2_Ldw128/sdram_storage[3]_i_2_n_0
    SLICE_X63Y15         LUT4 (Prop_lut4_I0_O)        0.124    13.900 f  VexRiscvLitexSmpCluster_Cc1_Iw64Is8192Iy2_Dw64Ds8192Dy2_Ldw128/OSERDESE2_18_i_8/O
                         net (fo=5, routed)           0.885    14.785    VexRiscvLitexSmpCluster_Cc1_Iw64Is8192Iy2_Dw64Ds8192Dy2_Ldw128/OSERDESE2_18_i_8_n_0
    SLICE_X64Y5          LUT4 (Prop_lut4_I3_O)        0.124    14.909 r  VexRiscvLitexSmpCluster_Cc1_Iw64Is8192Iy2_Dw64Ds8192Dy2_Ldw128/OSERDESE2_24_i_4/O
                         net (fo=2, routed)           0.568    15.477    a7ddrphy_dfi_p3_cs_n
    OLOGIC_X1Y0          OSERDESE2                                    r  OSERDESE2_24/D7
  -------------------------------------------------------------------    -------------------

                         (clock crg_clkout0 rise edge)
                                                     10.000    10.000 r  
    E3                                                0.000    10.000 r  clk100 (IN)
                         net (fo=0)                   0.000    10.000    clk100
    E3                   IBUF (Prop_ibuf_I_O)         1.418    11.418 r  clk100_IBUF_inst/O
                         net (fo=1, routed)           1.181    12.599    crg_clkin
    PLLE2_ADV_X1Y1       PLLE2_ADV (Prop_plle2_adv_CLKIN1_CLKOUT0)
                                                      0.083    12.682 r  PLLE2_ADV/CLKOUT0
                         net (fo=1, routed)           1.576    14.258    crg_clkout0
    BUFGCTRL_X0Y16       BUFG (Prop_bufg_I_O)         0.091    14.349 r  BUFG/O
                         net (fo=9005, routed)        1.539    15.888    sys_clk
    OLOGIC_X1Y0          OSERDESE2                                    r  OSERDESE2_24/CLKDIV
                         clock pessimism              0.311    16.199    
                         clock uncertainty           -0.057    16.143    
    OLOGIC_X1Y0          OSERDESE2 (Setup_oserdese2_CLKDIV_D7)
                                                     -0.625    15.518    OSERDESE2_24
  -------------------------------------------------------------------
                         required time                         15.518    
                         arrival time                         -15.477    
  -------------------------------------------------------------------
                         slack                                  0.041    
daveshah1 commented 4 years ago

Maybe I'm missing something, but where is the distributed RAM? I see LUTs and carries only.

Dolu1990 commented 4 years ago

@daveshah1 Hooo right, no distributed ram here, sorry, i only had look at the name (tag_mem_reg_i_41) which look to me as a distributed ram XD

Still surprised how the SDCARD find its way to the OSERDES2 of the litedram ^^

daveshah1 commented 4 years ago

Yeah feels like a register stage is missing somewhere...

Dolu1990 commented 4 years ago

Hooo i think i understand why i was confused about that path.

Seem like even if in the SMP cluster i disable the coherent DMA interface, litex still provide a non coherent DMA interface for the SDCARD.

So sdcard controller has a wishbone master interface, which can access everybody, included the peripherals, which mean it can access litedram to send the initialisaiton commands, which have dirrect connections to the OSERDES2

If all the above is right, we might need to add some pipelining stages in the wishbone interconnect. (or maybe just in litedram to isolate the OSERDES2 from direct connections from the wishbone bus)

enjoy-digital commented 4 years ago

@Dolu1990: your reasoning is right yes, i could look at it in early september. Do you also have timing issues with the coherent DMA enabled? (LiteX will not bridge the main bus to LiteDRAM in this case: https://github.com/enjoy-digital/litex/blob/master/litex/soc/integration/soc.py#L1241-L1246).

Dolu1990 commented 4 years ago

@enjoy-digital Enabling the coherent DMA seem to fix the timing issue (critical path moved somewere else and is more relaxed)

enjoy-digital commented 4 years ago

@Dolu1990: can you try with https://github.com/enjoy-digital/litex/commit/e4f5dd987eb8d2a98c714d4e2130fe015e1df244? When building Arty design with --cpu-count=1 and no coherent DMA it seems fine.

enjoy-digital commented 4 years ago

As discussed with @Dolu1990, this is fixed with enjoy-digital/litex@e4f5dd9.