lneuhaus / pyrpl

pyrpl turns your RedPitaya into a powerful DSP device, especially suitable as a lockbox in quantum optics experiments.
http://lneuhaus.github.io/pyrpl/
MIT License
137 stars 105 forks source link

Massive fpga timing violations #488

Open abregnsbo opened 1 year ago

abregnsbo commented 1 year ago

When building an fpga image using rtl from either 'master' or 'pyton3-only' branch, there is a large timing violation in the iq2 demodulator block. The violating path is through low-pass filter and gain block. Reason could be that unlike iq0/iq1 which only have 2 stage low-pass filter, iq2 has a 4 stage low-pass filter.

Using Vivado 2020.1.

Slack (VIOLATED) :        -4.645ns  (required time - arrival time)
  Source:                 i_dsp/genblk6[7].iq_2_outputs/quadrature_filter_reg[7]/C
                            (rising edge-triggered cell FDRE clocked by pll_adc_clk  {rise@0.000ns fall@4.000ns period=8.000ns})
  Destination:            i_dsp/genblk6[7].iq_2_outputs/modulator/secondproduct1/D[12]
                            (rising edge-triggered cell DSP48E1 clocked by pll_adc_clk  {rise@0.000ns fall@4.000ns period=8.000ns})
  Path Group:             pll_adc_clk
  Path Type:              Setup (Max at Slow Process Corner)
  Requirement:            8.000ns  (pll_adc_clk rise@8.000ns - pll_adc_clk rise@0.000ns)
  Data Path Delay:        10.988ns  (logic 5.041ns (45.876%)  route 5.947ns (54.124%))
  Logic Levels:           7  (DSP48E1=1 LUT5=4 LUT6=2)
  Clock Path Skew:        0.039ns (DCD - SCD + CPR)
    Destination Clock Delay (DCD):    5.468ns = ( 13.468 - 8.000 ) 
    Source Clock Delay      (SCD):    5.887ns
    Clock Pessimism Removal (CPR):    0.458ns
  Clock Uncertainty:      0.069ns  ((TSJ^2 + DJ^2)^1/2) / 2 + PE
    Total System Jitter     (TSJ):    0.071ns
    Discrete Jitter          (DJ):    0.118ns
    Phase Error              (PE):    0.000ns

    Location             Delay type                Incr(ns)  Path(ns)    Netlist Resource(s)
  -------------------------------------------------------------------    -------------------
                         (clock pll_adc_clk rise edge)
                                                      0.000     0.000 r  
    U18                                               0.000     0.000 r  adc_clk_p_i (IN)
                         net (fo=0)                   0.000     0.000    adc_clk_p_i
    U18                  IBUFDS (Prop_ibufds_I_O)     0.983     0.983 r  i_clk/O
                         net (fo=1, routed)           1.306     2.289    pll/clk
    PLLE2_ADV_X0Y0       PLLE2_ADV (Prop_plle2_adv_CLKIN1_CLKOUT0)
                                                      0.089     2.378 r  pll/pll/CLKOUT0
                         net (fo=1, routed)           1.754     4.132    pll_adc_clk
    BUFGCTRL_X0Y0        BUFG (Prop_bufg_I_O)         0.101     4.233 r  bufg_adc_clk/O
                         net (fo=12942, routed)       1.654     5.887    i_dsp/genblk6[7].iq_2_outputs/clk_i
    SLICE_X29Y24         FDRE                                         r  i_dsp/genblk6[7].iq_2_outputs/quadrature_filter_reg[7]/C
  -------------------------------------------------------------------    -------------------
    SLICE_X29Y24         FDRE (Prop_fdre_C_Q)         0.456     6.343 r  i_dsp/genblk6[7].iq_2_outputs/quadrature_filter_reg[7]/Q
                         net (fo=49, routed)          1.047     7.390    i_dsp/genblk6[7].iq_2_outputs/iqfilter[1]/genblk2[0].lpf/filter_on
    SLICE_X25Y23         LUT5 (Prop_lut5_I3_O)        0.124     7.514 r  i_dsp/genblk6[7].iq_2_outputs/iqfilter[1]/genblk2[0].lpf/signal_o[19]_INST_0/O
                         net (fo=3, routed)           0.696     8.210    i_dsp/genblk6[7].iq_2_outputs/iqfilter[1]/genblk2[1].lpf/signal_i[19]
    SLICE_X29Y23         LUT5 (Prop_lut5_I0_O)        0.124     8.334 r  i_dsp/genblk6[7].iq_2_outputs/iqfilter[1]/genblk2[1].lpf/signal_o[19]_INST_0/O
                         net (fo=3, routed)           0.491     8.825    i_dsp/genblk6[7].iq_2_outputs/iqfilter[1]/genblk2[2].lpf/signal_i[19]
    SLICE_X28Y27         LUT5 (Prop_lut5_I0_O)        0.124     8.949 r  i_dsp/genblk6[7].iq_2_outputs/iqfilter[1]/genblk2[2].lpf/signal_o[19]_INST_0/O
                         net (fo=3, routed)           0.525     9.474    i_dsp/genblk6[7].iq_2_outputs/iqfilter[1]/genblk2[3].lpf/signal_i[19]
    SLICE_X29Y27         LUT5 (Prop_lut5_I0_O)        0.124     9.598 r  i_dsp/genblk6[7].iq_2_outputs/iqfilter[1]/genblk2[3].lpf/signal_o[19]_INST_0/O
                         net (fo=4, routed)           0.590    10.188    i_dsp/genblk6[7].iq_2_outputs/modulator/firstproduct_saturation[0]/factor1_i[19]
    DSP48_X1Y10          DSP48E1 (Prop_dsp48e1_A[19]_P[33])
                                                      3.841    14.029 r  i_dsp/genblk6[7].iq_2_outputs/modulator/firstproduct_saturation[0]/product/P[33]
                         net (fo=2, routed)           1.009    15.038    i_dsp/genblk6[7].iq_2_outputs/modulator/firstproduct_saturation[0]/p_0_in[0]
    SLICE_X33Y27         LUT6 (Prop_lut6_I2_O)        0.124    15.162 r  i_dsp/genblk6[7].iq_2_outputs/modulator/firstproduct_saturation[0]/product_o[12]_INST_0_i_1/O
                         net (fo=13, routed)          0.865    16.026    i_dsp/genblk6[7].iq_2_outputs/modulator/firstproduct_saturation[0]/product_o[12]_INST_0_i_1_n_0
    SLICE_X32Y29         LUT6 (Prop_lut6_I0_O)        0.124    16.150 r  i_dsp/genblk6[7].iq_2_outputs/modulator/firstproduct_saturation[0]/product_o[12]_INST_0/O
                         net (fo=1, routed)           0.725    16.875    i_dsp/genblk6[7].iq_2_outputs/modulator/firstproduct1[12]
    DSP48_X1Y11          DSP48E1                                      r  i_dsp/genblk6[7].iq_2_outputs/modulator/secondproduct1/D[12]
  -------------------------------------------------------------------    -------------------

                         (clock pll_adc_clk rise edge)
                                                      8.000     8.000 r  
    U18                                               0.000     8.000 r  adc_clk_p_i (IN)
                         net (fo=0)                   0.000     8.000    adc_clk_p_i
    U18                  IBUFDS (Prop_ibufds_I_O)     0.940     8.940 r  i_clk/O
                         net (fo=1, routed)           1.181    10.121    pll/clk
    PLLE2_ADV_X0Y0       PLLE2_ADV (Prop_plle2_adv_CLKIN1_CLKOUT0)
                                                      0.084    10.205 r  pll/pll/CLKOUT0
                         net (fo=1, routed)           1.594    11.799    pll_adc_clk
    BUFGCTRL_X0Y0        BUFG (Prop_bufg_I_O)         0.091    11.890 r  bufg_adc_clk/O
                         net (fo=12942, routed)       1.578    13.468    i_dsp/genblk6[7].iq_2_outputs/modulator/clk_i
    DSP48_X1Y11          DSP48E1                                      r  i_dsp/genblk6[7].iq_2_outputs/modulator/secondproduct1/CLK
                         clock pessimism              0.458    13.926    
                         clock uncertainty           -0.069    13.857    
    DSP48_X1Y11          DSP48E1 (Setup_dsp48e1_CLK_D[12])
                                                     -1.626    12.231    i_dsp/genblk6[7].iq_2_outputs/modulator/secondproduct1
  -------------------------------------------------------------------
                         required time                         12.231    
                         arrival time                         -16.875    
  -------------------------------------------------------------------
                         slack                                 -4.645    

abregnsbo commented 1 year ago

image

abregnsbo commented 1 year ago

The timing violations are not just in iir and iq2 block, but in most blocks. Using Vivado 2020.1 on the python3-only branch the post_route_timing_summary.rpt says:

Clock                 WNS(ns)      TNS(ns)  TNS Failing Endpoints  TNS Total Endpoints      
-----                 -------      -------  ---------------------  -------------------      
adc_clk                                                                                     
  clk_fb                                                                                    
  pll_adc_clk          -4.887   -15468.449                  12225                34369      
  pll_dac_clk_1x        0.633        0.000                      0                   45      
  pll_dac_clk_2p                                                                            
  pll_dac_clk_2x                                                                            
  pll_pwm_clk          -0.376       -0.783                      4                  440      
clk_fpga_3             -0.344      -11.358                     73                 1713      

ie. 12000 failing endpoints ! The violating paths are located in most blocks. Seen >3 ns violations in pid0/1, iq0/1 and asg. Many violations have a large net-delay (> 8 ns), which I guess is caused by the high 91% LUT utility. Vivado seem not to be able to report delay for typical corner, which could indicate how bad the situation is on a random RP board. Situation is similar on the master branch.

Running fpga implementation with modified 0.9.3 rtl, that has 75% LUT utilization, only gives 350 failing endpoints mainly in iir block.

Has anybody seen a clean timing summary with the newest pyrpl rtl ?