Open ranaya123 opened 5 years ago
Hi @ranaya123 , post-synthesis simulation is often tricky. It is a bit difficult to guess what could be happening here without knowing more details. A few possibilities that come to my mind (not a complete list):
A good way to check these things is to perform an identical simulation using the RTL code and directly compare what happens at the core interface level.
Hi, thanks for your input. The issue seems to be related to debug_req_i port of riscv_core where this port directly goes to the riscv_controller. In the RTL level simulation (hello world), this is asserted for a very short time while in post synthesis simulation, it hangs in 'high' state throughout the simulation as shown in following figure. So this is highly likely a hang in FSM of the controller.
i.e. the debug_mode signal has been removed by the synthesizer as it's not connected to the top level (fc_subsystem) design and cs_register. Synthesizer had also removed "debug_ebreaku" signal from cs_register, which is strange ! It has a direct connection from the riscv_controller to the cs_register, so I expect it to remain between the modules !!!
Btw to verify, do you provide a sample ASIC synthesis script for the entire pulpissimo platform ? Atleast would it be possible to get appropriate and realistic constraints (with uncertainties) for each and every block of the design ?
Thanks
I think you should focus on interface signals, the ones that "disappeared" seem to be internal ones. Whatever is going wrong, with >90% prob it's happening at the interface between netlist and RTL.
I think you should focus on interface signals, the ones that "disappeared" seem to be internal ones. Whatever is going wrong, with >90% prob it's happening at the interface between netlist and RTL.
Btw to verify, do you provide a sample ASIC synthesis script for the entire pulpissimo platform ? Atleast would it be possible to get appropriate and realistic constraints (with uncertainties) for each and every block of the design ?
Is there a documentation written on, how the RTLs should be made synthesizable (i.e. dc_shell directives) ?
We don't have a full "clean" script that I can share (if so we would have put it in the repo!), but in general it's quite mundane, something like this (I put only the key commands):
source -echo -verbose ./scripts/analyze_auto/ips_add_files.tcl > ip_errors.rpt
source -echo -verbose ./scripts/analyze_auto/rtl_add_files.tcl > rtl_errors.rpt
elaborate pulpissimo -work work
write -format ddc -hier -o ./unmapped/pulpissimo_unmapped.ddc pulpissimo
link
after 10000
set uniquify_naming_style "soc_%s_%d"
uniquify -force
source -echo -verbose -scripts/constraints.tcl
compile_ultra -no_autoungroup -no_boundary_optimization -timing -gate_clock
Proper constraints (especially I/O) depend a lot on your setup, however one thing that I can say is that some of the blocks (standard-cell memory based register files) will require exceptions, otherwise you will over-constrain them:
set_multicycle_path 2 -setup -through [get_pins soc_domain_i/pulp_soc_i/fc_subsystem_i/lFC_CORE/id_stage_i/registers_i/riscv_register_file_i/mem_reg*/Q]
set_multicycle_path 1 -hold -through [get_pins soc_domain_i/pulp_soc_i/fc_subsystem_i/lFC_CORE/id_stage_i/registers_i/riscv_register_file_i/mem_reg*/Q]
set_multicycle_path 2 -setup -through [get_pins soc_domain_i/pulp_soc_i/fc_subsystem_i/lFC_CORE/id_stage_i/registers_i/riscv_register_file_i/mem_fp_reg*/Q]
set_multicycle_path 1 -hold -through [get_pins soc_domain_i/pulp_soc_i/fc_subsystem_i/lFC_CORE/id_stage_i/registers_i/riscv_register_file_i/mem_fp_reg*/Q]
@FrancescoConti : Okay I was able to pinpoint the issue. The main reason for the "halt" seems to be a wrongly issued instruction address caused by the FSM in the prefetch_buffer_i. Take a look at the following figure for RTL simulation :
At the time instance where the "white arrow" is, branch_i perfectly becomes zero at the rising edge of the clock. So that, according to riscv_prefetch_buffer FSM, the instr_addr_o = fetch_addr when the CS=WAIT_RVALID. But in synthesized design, with realistic delays, branch_i never becomes 0 at this time instance as shown in following figure:
The combinatorial delay of FSM slightly stretches branch_i and as a consequence, instr_addr_o = addr_i instead of fetch_addr as shown. This affects the CS and NS as well. From this point onwards, instr_addr_o stucks at 1a110804 which is not the intended behavior.
Since branch_i is produced at riscv_if_stage wrt to the rising edge of the clk and checking its status again at prefetch_buffer in the same clk cycle won't result the same RTL level simulation outputs....
So I solved the issue. The trick is to replace the clock gating cells (applied by the synthesizer) by their behavioral models to avoid hold violations. Otherwise, data is sampled at two different clock edges from peripheral and core sides.....
Hi @ranaya123, how did you replace the clock gating cells with behav models? Also, have you tried to synthesize bigger design, for example the soc_domain? I have trying to do this but getting into a lot of issues. One thing that I saw was that the clock_en_i pin to riscv_core remains unconnected which might be a problem.
@vikramjain236 I haven't had a chance to synthesize the bigger system. Will be looking to that in coming weeks. Regarding the clock gating cells, you can first synthesize the design with clock_gating enabled and then replace those cells in synthesized netlist with their behavioral model (with latching).
To properly annotate the SDF, you have to skip those cells in .sdf file as well. So that data sampling would become synchronized !
Anuradha
Hi Vikram If you are looking for the behavioural model of the cluster clock gate, you can find it here: https://github.com/pulp-platform/tech_cells_generic/blob/master/src/cluster_clock_gating.sv Regards, Renzo
I get this Access to register error when I try to do post-synthesis simulation. Does anyone know what the problem might be? (I synthesize only the soc_domain.sv and all its sub modules.
# [TB] 177701ns - Halting the Core
# [TB] 236501ns - Writing the boot address into dpc
# ** Error: Access to register 07b1 failed with error X
# Time: 280601 ns Scope: jtag_pkg.debug_mode_if_t.wait_command File: /volume1/users/vjain/pulpissimo/sim/../rtl/tb/jtag_pkg.sv Line: 770
# [TB] 280601ns - Loading L2
# [JTAG] Loading L2 with pulp tap jtag interface
# [pulp_tap_if] WRITE32 burst @1c000000 for 1024 bytes.
# [pulp_tap_if] WRITE32 burst @1c000400 for 1024 bytes.
@vikramjain236 Hi, few things to check before gate level simulation:
Are interfaces between sub modules preserved during the synthesis? i.e. you may disable boundary optimization. Take a look at following synthesis script for riscv_core (only): https://pastebin.com/Lm4TfGkD
It wasn't only the global cluster clock gating, but also the clock gating local to sub modules had to be replaced too. So the behavioural modelling of the clock gate cells have to be adopted for your preferred clock gating style (used during the synthesis). If it's default style, then the cluster clock gating RTL model should work.
Anuradha
@FrancescoConti @renzoandri @ranaya123
I have been trying to run a post synthesis simulation on the pulpissimo environment.
You can also find the vcd dump in my google drive: https://drive.google.com/open?id=102YbQ7HJ0prg3M4Es373vznWqamDxc03
Hope to get some help from you! Thanks!
I haven't done post-synth simulation on the entire pulpissimo and also never used the behavioural cells. the badacce5 means bad access (don't know when this error appears). One thing you could check is: if the simulation is timed or untimed. It might be that you have hold violations, because some parts are timed and other parts of your designed is timed (e.g. the behavioural clock gating cells are fore sure untimed). And the memory models are obviously also a common source of problems to be checked.
Hi @vikramjain236 , I am a bit perplexed by swapping in & out behavioral models (maybe I misunderstood what you wrote). Just to be 100% explicit, the "correct" flow should be
cluster_clock_gating
). The SRAM behavioral model generated by your memory compiler will have to replace all generic_mem
instances. If your memory is different to what we assume in the generic one, you will have to use a different SRAM, combine multiple memory cuts or otherwise add logic so that the interface with the rest of the system is maintained.soc_domain
. Make sure that the correct modules are linked, especially for what concerns the SRAMs.On your specific error: badacce5
is generated by the SoC interconnect when you try to access an address that is not mapped. I think however this is more a symptom than the origin of your problem. I will try to have a look at the VCD, in the mean time let us know whether you followed the procedure above.
Hi All,
This question is regarding the gate level simulation on Pulpissimo platform. Right now I only want to perform this simulation on core level. So I have the synthesized netlist of the riscV core, added it to the build tree with the model file of the standard-cells, and successfully built the system.
When I run the simulation in Modelsim, I can see that the nestlist of the riscV core appears in the design hierarchy and the SDF file was properly annotated to the right scope :
I've also put the riscv_tracer riscv_tracer_i() in the synthesized netlist. What's possibly going wrong here ?
Thanks in advance