lowRISC / opentitan

OpenTitan: Open source silicon root of trust
https://www.opentitan.org
Apache License 2.0
2.5k stars 745 forks source link

[dv,dvsim,tooling] Questa Support #24341

Open hcallahan-lowrisc opened 1 month ago

hcallahan-lowrisc commented 1 month ago

Description

This is a bucket-issue for improving Questa support in OpenTitan.

I want to use this issue to drive support forwards by gathering user feedback into a working branch of fixes, which can later be merged all at once. If you want to help, please see the steps you can take below!

Motive

I would like to get Questa support into a better place across the project. Questa is a simulator that many people have access to, and a part of maximizing value of the OpenTitan project is to keep the barrier to entry as low as possible for anyone who is interested. As lowRISC currently does not have access to questa licenses, nor is anyone regularly developing on OpenTitan with it or regressing the DV suite using it, the support there has always been limited and liable to break without notice. I hope one day we can test our codebase against more simulators and catch breaking changes before they are merged, but up to now it has been a best-effort approach.

Ctx

Questa support was added initially to our DV test tool 'dvsim' in #10574 - [hw/dv] Feature/questa dv, though I am not sure to what extent it was functional at the time. (i.e. which testbenches could be compiled/simulated, and which were incompatible). Our signoff simulator for blocks/tops is primarily VCS (for example. see the earlgrey top-level testbench config file chip_sim_cfg.hjson#L15), although some blocks now use Xcelium for signoff as well.

'dvsim' is extensible, and the support added above allows for invoking simulations using the questa CLI when you use --tool=questa. Fundamentally, dvsim should be able to assemble a valid CLI invocation for any tool it supports, with any number of arguments that are specific to the quirks of any program. Some parts of this process are quite generic between tools. For example, we use fusesoc .core files to package blocks and components, and dvsim will invoke fusesoc to generate output collateral such as a specific filelist for the simulation/testbench of interest. The tool-specific file /hw/dv/tools/dvsim/questa.hjson is then responsible for tying-up the generic parts of the flow with the implementation details of a tool.

Existing Questa Tickets

I did a little search of previous issues and pull requests related to questa support. I've pulled out the ones I think might be useful as a reference here...

Issues

https://github.com/lowRISC/opentitan/issues?q=is%3Aissue+questa+

Ticket Desc.
OPEN
#24174 - [questa] Pls update questa.hjson file Questa support request
#23293 - [siemens] updated questa.hjson General Questa questions
#23268 - [documentation] generating scripts used to run Jasper Gold Porting Jasper formal scripts to QuestaFormal
#22755 - gpio_tl_intg_err test failing on Questa Test passes on Xcelium, but not in Questa
#22243 - [dvsim] Various questions Questa/CentOS7 support questions, also dvsim/bazel questions
#22217 - [dvgen] generating testbench w/in Ubuntu container but simulating in CentOS Questa/CentOS7 support questions
#18039 - [dvsim] Riviera/Questa support is broken - around otbn_memutil Requesting Questa/Riviera support
CLOSED
#22162 - [ci] No module named 'google_verible_verilog_syntax_py (2024) General questa/bazel support questions
#21518 - [dvsim] Questa uart_smoke failing at tl_agent_pkg load Questa SV / LRM incompatabilities, not sure a fix was merged but it may have been elsewhere.
#16994 - [aes] aes_core simulation using ModelSim Request for questa/modelsim support
#9514 - Edalize and dvsim Discusses some initial SV-incompatabilites before questa.hjson was merged in #10574
#4541 - [DV] missing has_edn check in cip_base_env.sv? SV issue found while adding questa support. Fixed.
#4529 - Sim: Illegal %s format specififer for class object Questa incompatabilites -> fixed by adding '-suppress vsim-8323' in #10574
#4528 - Sim: Dangling virtual interface - Questa support Questa SV incompatabilities -> fixed by adding '-permit_unmatched_virtual_intf' in #10574
#4427 - Sim: tlul_assert: SVA errors - QSTA UART Smoke test Questa SV LRM incompatabilites in tl_host_driver.sv - closed due to lack of feedback
#4377 - SIM: Questa support: FATAL error from RAL OpenTitan does not support pre-compiled UVM -> problem found when trying to add Questa build support
#4355 - Sim: Data type mismatch issue Questa SV Support issue - closed as wontfix, pending vendor feedback on questa LRM interpretation
#4340 - dvsim: QSTA support - sim_tops needs to be moved to run_opts Some initial design questions about adding dvsim questa support
#4334 - Renaming dir sim-vcs inside fusesoc.hjson More dvsim+fusesoc oddities than questa
#4332 - dvsim: Questa support - prim_clock_gating.v - is this missing in file-list? Adding initial questa.hjson
#4153 - EalrGrey Questa support? Questa support request

PRs

https://github.com/lowRISC/opentitan/pulls?q=is%3Apr+questa+

Ticket Desc.
OPEN
CLOSED
#10574 - [hw/dv] Feature/questa dv Initial questa support in dvsim (questa.hjson) + some docs.
#10365 - [hw/dv] Changed set_response_queue to be umbounded due to queue overflow issues in Questa Fix for #9514 ( a questa issue )
#4435 - [dv] Fix Questa warning and remove unused var Fixes for #4377/#4398
#4366 - [dv] Move sim_tops to {tool}.hjson Preparing for initial questa support
#240 - [alert_handler] Add alert_handler RTL implementation

Fixing outstanding issues

To get our DV flows working with Questa more broadly, the bulk of the work will be in SV / LRM differences in how tools interpret our codebase. A familiar hurdle for sure!

This process will probably raise a number of issues in dv base classes to begin with, and then follow onto testbench code specific to each block.

Probably the best approach will be to work block-by-block through the individual testbenches, first fixing any build issues, followed by any runtime differences. The following table is a good starting list of some block level testbenches for non-parameterized blocks, and suggested invocations to test them. (I've just worked down the list of IP in hw/ip to make a start. This list can become a more comprehensive checklist of tested-blocks in the future.)

block simulation command
adc_ctrl ./util/dvsim/dvsim.py hw/ip/adc_ctrl/dv/adc_ctrl_sim_cfg.hjson -i adc_ctrl_smoke --tool questa
aes ./util/dvsim/dvsim.py hw/ip/aes/dv/aes_masked_sim_cfg.hjson -i aes_smoke --tool questa
aon_timer ./util/dvsim/dvsim.py hw/ip/aon_timer/dv/aon_timer_sim_cfg.hjson -i aon_timer_smoke --tool questa
csrng ./util/dvsim/dvsim.py hw/ip/csrng/dv/csrng_sim_cfg.hjson -i aon_timer_smoke --tool questa
edn ./util/dvsim/dvsim.py hw/ip/edn/dv/edn_sim_cfg.hjson -i edn_smoke --tool questa
entropy_src ./util/dvsim/dvsim.py hw/ip/entropy_src/dv/entropy_src_sim_cfg.hjson -i entropy_src_smoke --tool questa
gpio ./util/dvsim/dvsim.py hw/ip/gpio/dv/gpio_sim_cfg.hjson -i gpio_smoke --tool questa
hmac ./util/dvsim/dvsim.py hw/ip/hmac/dv/hmac_sim_cfg.hjson -i hmac_smoke --tool questa
i2c ./util/dvsim/dvsim.py hw/ip/i2c/dv/i2c_sim_cfg.hjson -i i2c_smoke --tool questa
keymgr ./util/dvsim/dvsim.py hw/ip/keymgr/dv/keymgr_sim_cfg.hjson -i keymgr_smoke --tool questa
kmac ./util/dvsim/dvsim.py hw/ip/kmac/dv/kmac_sim_cfg.hjson -i kmac_smoke --tool questa

The top-level testbench would eventually be desirable to fix, though I suspect it will be the most work to get there.

top simulation command
earlgrey ./util/dvsim/dvsim.py hw/top_earlgrey/dv/chip_sim_cfg.hjson -i chip_sw_rv_timer_irq --tool questa

After running smoketests for each block, probably the next thing would be to run 'all_once' regressions in each block. E.g. ./util/dvsim/dvsim.py hw/<ip>/dv/<ip>_sim_cfg.hjson -i all_once --tool questa This should exercise all of the stimulus vseq's listed in the sim_cfg.hjson configuration file. Stimulus sequences are located in the following directory for each block-level testbench : hw/<ip>/dv/env/seq_lib/.

Our existing convention has been to avoid #ifdefs unless absolutely necessary, so work in this area will need to prioritize finding common language constructs the tools agree on. There are a very small number of #ifdef XCELIUM or #ifdef VCS switches in the codebase, and adding more of these should be considered a last-resort. However, I think absorbing changes that use #ifdefs in the short-term, tracked in issues with pending vendor support feedback, may be workable. For the working branch described below, short-term solutions using #ifdefs will be acceptable.

How you can help

To start with, I've created a draft PR to accumulate a working set of changes to fix bugs as they are found. As bugs and suggested fixes come in, we can discuss them in this issue, and then propagate them over to that PR.

Steps: 1) Setup the opentitan repository & dependencies by following the instructions in steps 1-4 here : https://opentitan.org/book/doc/getting_started/index.html

This will probably take a bit of time to get sorted out, as the feedback loop is inherently manual, and I don't have too many free cycles to put into this. However, I hope that centralizing the discussions here will keep the process moving forwards, and eventually we can get up to a good parity with the other tools.

Thanks!

hcallahan-lowrisc commented 1 month ago

reserved for updates

hcallahan-lowrisc commented 3 weeks ago

@lmg260a (Moving my reply to your comment from 23293 over here...)

Hi, That's all great news! What's the best approach for submitting code changes? Create a branch and then propose a merge, email files, something else?

<...>

So how should I raise those? As separate tickets? I've found that lots of small tickets are easier to track.

Also: should I use "[siemens]" as the ticket-group in the subject line, to make things easier?

I'd like to try and keep things centralized to this issue to start with. If there are common design patterns in the codebase causing problems, we can always spin discussions about them out into individual issues/tickets if it becomes too noisy within this thread. If we do open new issues, probably tagging with [siemens] or [questa] and linking back to this one will be best.

I think the best way to submit/suggest code changes would be to post a link to a branch on your github, or to paste the diff into this issue as a comment. I want to integrate all the changes into the working PR #24331 so I can regress it against all the tools I do have access to in one go before merging, so raising individual pull requests for changes is probably going to be quite a lot of noise at this point.

I've also found things which might be exactly what's wanted, but if not clearly documented it would be really easy for people to break things. For example, both a parent and child classes have constraints named the same. Per LRM, the child constraint overwrites the constraint in the parent. My problems are (1) I can't tell if that's deliberate or a bug. It should be clearly documented if it IS deliberate. (2) it may be that whatever simulator you're using doesn't interpret the LRM the same way I do, and it could be that (2.a) I'm wrong or (2.b) the LRM is subtlely vague - and so we're both​ wrong (or at least, neither of us can claim we're right).

Thanks for bringing this up. In the OpenTitan codebase, it's a common pattern to redefine a constraint in a child class intentionally to overwrite the parent class constraint. As you say, that is what the LRM specifies the behaviour should be. Based on your comment (1), is your previous experience that this should always be documented at every instance? Or is a comment about this something that should be incorporated into a project level style-guide? I'm interested in any comments about style generally, and if there is any documentation holes we can fill to clarify things. I don't think I have heard the comment about always commenting overridden class constraints before, but everyone has different industry experiences of course. Any docs we can provide to get everyone on the same page as quickly as possible are valuable!

So one thing I'm debugging is that one of the clock period is being set to zero, then being used to do a divide, which of course results in a divide-by-zero. So there's a race-condition someplace going on.

Can you checkout the working PR #24331, and share the dvsim command you are using to generate the test? Also could you share the variable/error, plus which assignment / randomization you suspect is incorrectly generating 0? I can run it locally and share some logs / values, or suggest where things might be going wrong.

Thanks!

hcallahan-lowrisc commented 2 weeks ago

@lmg260a (Moving my reply to your comment here https://github.com/lowRISC/opentitan/pull/24331#issuecomment-2303289278 over to the discussion issue...)

The example dvsim command from your comment (util/dvsim/dvsim.py hw/top_earlgrey/dv/chip_sim_cfg.hjson -i chip_sw_uart_tx_rx -n --tool questa) won't run anything because -n is a dry-run switch, which just logs the build commands and exits. Could you try running without that switch, and let me know what is broken? Running via dvsim is the only supported flow, as it populates all of the defines etc. automatically, and that way we can know everyone is on the same page. For example, if I checkout #24331 and run dvsim I get the following:

// I don't have questa, so set a dummy variable...
$ export QUESTA_HOME=/home/harry/questa/dummy
$ util/dvsim/dvsim.py hw/top_earlgrey/dv/chip_sim_cfg.hjson -i chip_sw_uart_tx_rx --tool questa
<EVERYTHING FAILS>

// The relevant logfile is at `<opentitan>/scratch/<branch>/chip_earlgrey_asic-sim-questa/default/build.log`, which shows all the commands which were run. Mine looks normal until...
[make]: build
cd /home/harry/projects/opentitan/scratch/dvsim_questa_fixes/chip_earlgrey_asic-sim-questa/default/sim-vcs && /home/harry/dummy/questasim/linux_x86_64/qrun -optimize +define+TOP_LEVEL_DV +define+UVM +define+UVM_NO_DEPRECATED +define+UVM_REGEX_NO_DPI +define+UVM_REG_ADDR_WIDTH=32 +define+UVM_REG_DATA_WIDTH=64 +define+UVM_REG_BYTENABLE_WIDTH=8 +define+SIMULATION +define+DUT_HIER=tb.dut -timescale 1ns/1ps -outdir /home/harry/projects/opentitan/scratch/dvsim_questa_fixes/chip_earlgrey_asic-sim-questa/default/qrun.out -uvm -uvmhome /home/harry/dummy/questasim/verilog_src/uvm-1.2 -mfcu -f /home/harry/projects/opentitan/scratch/dvsim_questa_fixes/chip_earlgrey_asic-sim-questa/default/sim-vcs/lowrisc_dv_chip_sim_0.1.scr -top clkmgr_bind -top pwrmgr_bind -top rstmgr_bind -top sec_cm_prim_onehot_check_bind -top sec_cm_prim_sparse_fsm_flop_bind -top spi_host_bind -top top_earlgrey_error_injection_ifs_bind -top top_earlgrey_bind -top xbar_main_bind -top xbar_peri_bind -top tb -voptargs="+acc=nr"
bash: line 1: /home/harry/dummy/questasim/linux_x86_64/qrun: No such file or directory

Printed with newlines, the invocation was:

cd /home/harry/projects/opentitan/scratch/dvsim_questa_fixes/chip_earlgrey_asic-sim-questa/default/sim-vcs &&
/home/harry/dummy/questasim/linux_x86_64/qrun
-optimize
+define+TOP_LEVEL_DV
+define+UVM
+define+UVM_NO_DEPRECATED
+define+UVM_REGEX_NO_DPI
+define+UVM_REG_ADDR_WIDTH=32
+define+UVM_REG_DATA_WIDTH=64
+define+UVM_REG_BYTENABLE_WIDTH=8
+define+SIMULATION
+define+DUT_HIER=tb.dut
-timescale 1ns/1ps
-outdir /home/harry/projects/opentitan/scratch/dvsim_questa_fixes/chip_earlgrey_asic-sim-questa/default/qrun.out
-uvm
-uvmhome /home/harry/dummy/questasim/verilog_src/uvm-1.2
-mfcu
-f /home/harry/projects/opentitan/scratch/dvsim_questa_fixes/chip_earlgrey_asic-sim-questa/default/sim-vcs/lowrisc_dv_chip_sim_0.1.scr
-top clkmgr_bind
-top pwrmgr_bind
-top rstmgr_bind
-top sec_cm_prim_onehot_check_bind
-top sec_cm_prim_sparse_fsm_flop_bind
-top spi_host_bind
-top top_earlgrey_error_injection_ifs_bind
-top top_earlgrey_bind
-top xbar_main_bind
-top xbar_peri_bind
-top tb
-voptargs="+acc=nr"

This CLI invocation looks okay to me at a pass. Do you see the same failure mode as you described in the other issue when invoking this way, or something else?

lmg260a commented 2 weeks ago

Some best practices that would help: 1) 4-state logic was shown to generate both optimistic (false-pass) and pessimistic (false-fail) results in simulation. This was back in the 90s. So it's much better to use 2-state, plus static tools (CDC/RDC) and property checking. Most simulators have the ability to initialize all flops to random 0/1 values at time0 - so you can use that if you are unsure about Xs. Some defensive programming I've learned helps over the years: A) use explicit logic to convert from 4-state to an enum with 4 states, or two 2-state bits; bugs are common when using casting b/c often you want to know that a signal is 'Z' or 'X', and the implicit cast just returns 0. B) use unit-testing (svunit) to verify code before it gets committed. One company reported that when they switched to unit-testing in 12 months they saw a 97% drop in bugs reported in the field. C) Ideally, use test-driven development: the main thing here is that it'll help you spot architectural or interface problems that are just going to be very expensive to verify, and really expensive to debug/maintain. Before you do any coding. D) Best practice in Agile is each developer spends 20% of their time just refactoring code: a couple of hours spent rewriting an interface can save days in verifying a really awkward interface design. E) Put all generated files outside the repo clone - that way, "git status" should only show the files that should be committed. F) I've seen a couple of schools of thought on big-picture repos: one is that you have one huge repo (but then any change means you have to rerun the full regression), or one for RTL, another for sim testbench, another for static/formal scripts. There's arguments for both - I just bring it up in case you find the idea useful. The other is that you have one really small repo per IP (so a 'opentitan_uart' repo, etc.). This way you have all the files involved in an IP in one location, and it greatly simplifies reuse and enhancement. G) For debugging purposes: I've found it really useful to have the verification environment emit a script (including all environment variable settings) that runs the work. This way, if anything goes wrong, you just take that script and attach it to the ticket so it can 100% be reproduced. Note: it really helps if the paths in the script are set using environment variable-names instead of being replaced with their values. I.e. the path to a file should be "$FOO/...", not "/a/b/c/...". With $FOO being set earlier in the script. H) The saying is: 10% documentation, 10% coding, and 80% verification. So really project success is all about rewriting the code to reduce verification cost.

I'd be happy to do any sort of online class or talk that might help your team on this - I think you've got an amazing thing here, and I'm just trying to do what I can to help. The problem with greatness is you can't ever stop improving, or else you just eventually wind up back at "good".