lowRISC / opentitan

OpenTitan: Open source silicon root of trust
https://www.opentitan.org
Apache License 2.0
2.58k stars 777 forks source link

[ci/test] sw_strap_value_sim_verilator fails periodically in CI #16831

Closed milesdai closed 1 year ago

milesdai commented 1 year ago

16829 removes this test. It works fine on a local workstation, but fails every couple PRs in CI. We can move this to a nightly CI run.

Logs: ``` 2022-12-14T17:59:45.8975419Z ==================== Test output for //sw/device/silicon_creator/rom/e2e/weak_straps:sw_strap_value_sim_verilator: 2022-12-14T17:59:45.8978443Z Invoking test: sw/host/tests/rom/sw_strap_value/sw_strap_value --rcfile= --logging=info --interface=verilator --verilator-bin=hw/build.verilator_real/sim-verilator/Vchip_sim_tb --verilator-rom=sw/device/lib/testing/test_rom/test_rom_sim_verilator.39.scr.vmem --verilator-flash=sw/device/silicon_creator/rom/e2e/weak_straps/sw_strap_value_prog_sim_verilator.fake_prod_key_0.signed.64.scr.vmem --verilator-otp=hw/ip/otp_ctrl/data/img_rma.24.vmem 2022-12-14T17:59:45.8982244Z [2022-12-14T17:56:44Z INFO opentitanlib::transport::verilator::subprocess] CWD: Ok("/root/.cache/bazel/_bazel_root/5e4f8c833cf3a422fdf6ca333dd151d7/execroot/lowrisc_opentitan/bazel-out/k8-fastbuild/bin/sw/device/silicon_creator/rom/e2e/weak_straps/sw_strap_value_sim_verilator.runfiles/lowrisc_opentitan") 2022-12-14T17:59:45.8985683Z [2022-12-14T17:56:44Z INFO opentitanlib::transport::verilator::subprocess] Spawning verilator: "hw/build.verilator_real/sim-verilator/Vchip_sim_tb" "--meminit=rom,sw/device/lib/testing/test_rom/test_rom_sim_verilator.39.scr.vmem --meminit=flash,sw/device/silicon_creator/rom/e2e/weak_straps/sw_strap_value_prog_sim_verilator.fake_prod_key_0.signed.64.scr.vmem --meminit=otp,hw/ip/otp_ctrl/data/img_rma.24.vmem" 2022-12-14T17:59:45.8988282Z [2022-12-14T17:56:44Z INFO opentitanlib::transport::verilator::stdout] Simulation of OpenTitan Earl Grey 2022-12-14T17:59:45.8990103Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] ================================= 2022-12-14T17:59:45.8991933Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] 2022-12-14T17:59:45.8993416Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] 2022-12-14T17:59:45.8995052Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] Tracing can be toggled by sending SIGUSR1 to this process: 2022-12-14T17:59:45.8996745Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] 2022-12-14T17:59:45.8998224Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] $ kill -USR1 12807 2022-12-14T17:59:45.8999813Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] 2022-12-14T17:59:45.9001067Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] 2022-12-14T17:59:45.9003959Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] GPIO: FIFO pipes created at /root/.cache/bazel/_bazel_root/5e4f8c833cf3a422fdf6ca333dd151d7/execroot/lowrisc_opentitan/bazel-out/k8-fastbuild/bin/sw/device/silicon_creator/rom/e2e/weak_straps/sw_strap_value_sim_verilator.runfiles/lowrisc_opentitan/gpio0-read (read) and /root/.cache/bazel/_bazel_root/5e4f8c833cf3a422fdf6ca333dd151d7/execroot/lowrisc_opentitan/bazel-out/k8-fastbuild/bin/sw/device/silicon_creator/rom/e2e/weak_straps/sw_strap_value_sim_verilator.runfiles/lowrisc_opentitan/gpio0-write (write) for 32-bit wide GPIO. 2022-12-14T17:59:45.9007271Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] GPIO: To measure the values of the pins as driven by the device, run 2022-12-14T17:59:45.9009997Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] $ cat /root/.cache/bazel/_bazel_root/5e4f8c833cf3a422fdf6ca333dd151d7/execroot/lowrisc_opentitan/bazel-out/k8-fastbuild/bin/sw/device/silicon_creator/rom/e2e/weak_straps/sw_strap_value_sim_verilator.runfiles/lowrisc_opentitan/gpio0-read # '0' low, '1' high, 'X' floating 2022-12-14T17:59:45.9012282Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] GPIO: To drive the pins, run a command like 2022-12-14T17:59:45.9014826Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] $ echo 'h09 l31' > /root/.cache/bazel/_bazel_root/5e4f8c833cf3a422fdf6ca333dd151d7/execroot/lowrisc_opentitan/bazel-out/k8-fastbuild/bin/sw/device/silicon_creator/rom/e2e/weak_straps/sw_strap_value_sim_verilator.runfiles/lowrisc_opentitan/gpio0-write # Pull the pin 9 high, and pin 31 low. 2022-12-14T17:59:45.9017882Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] $ echo 'wh10' > /root/.cache/bazel/_bazel_root/5e4f8c833cf3a422fdf6ca333dd151d7/execroot/lowrisc_opentitan/bazel-out/k8-fastbuild/bin/sw/device/silicon_creator/rom/e2e/weak_straps/sw_strap_value_sim_verilator.runfiles/lowrisc_opentitan/gpio0-write # Pull pin 10 high through a weak pull-up. 2022-12-14T17:59:45.9020430Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] 2022-12-14T17:59:45.9131176Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] SPI: Created /dev/pts/4 for spi0. Connect to it with any terminal program, e.g. 2022-12-14T17:59:45.9133202Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] $ screen /dev/pts/4 2022-12-14T17:59:45.9135075Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] NOTE: a SPI transaction is run for every 4 characters entered. 2022-12-14T17:59:45.9137720Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] SPI: Monitor output file created at /root/.cache/bazel/_bazel_root/5e4f8c833cf3a422fdf6ca333dd151d7/execroot/lowrisc_opentitan/bazel-out/k8-fastbuild/bin/sw/device/silicon_creator/rom/e2e/weak_straps/sw_strap_value_sim_verilator.runfiles/lowrisc_opentitan/spi0.log. Works well with tail: 2022-12-14T17:59:45.9141002Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] $ tail -f /root/.cache/bazel/_bazel_root/5e4f8c833cf3a422fdf6ca333dd151d7/execroot/lowrisc_opentitan/bazel-out/k8-fastbuild/bin/sw/device/silicon_creator/rom/e2e/weak_straps/sw_strap_value_sim_verilator.runfiles/lowrisc_opentitan/spi0.log 2022-12-14T17:59:45.9143122Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] 2022-12-14T17:59:45.9144933Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] UART: Created /dev/pts/5 for uart0. Connect to it with any terminal program, e.g. 2022-12-14T17:59:45.9146677Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] $ screen /dev/pts/5 2022-12-14T17:59:45.9148412Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] UART: Additionally writing all UART output to 'uart0.log'. 2022-12-14T17:59:45.9150189Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] 2022-12-14T17:59:45.9152727Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] USB: Monitor output file created at /root/.cache/bazel/_bazel_root/5e4f8c833cf3a422fdf6ca333dd151d7/execroot/lowrisc_opentitan/bazel-out/k8-fastbuild/bin/sw/device/silicon_creator/rom/e2e/weak_straps/sw_strap_value_sim_verilator.runfiles/lowrisc_opentitan/usb0.log. Works well with tail: 2022-12-14T17:59:45.9155948Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] $ tail -f /root/.cache/bazel/_bazel_root/5e4f8c833cf3a422fdf6ca333dd151d7/execroot/lowrisc_opentitan/bazel-out/k8-fastbuild/bin/sw/device/silicon_creator/rom/e2e/weak_straps/sw_strap_value_sim_verilator.runfiles/lowrisc_opentitan/usb0.log 2022-12-14T17:59:45.9158077Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] 2022-12-14T17:59:45.9159975Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] JTAG: Virtual JTAG interface dmi0 is listening on port 44853. Use 2022-12-14T17:59:45.9162139Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] OpenOCD and the following configuration to connect: 2022-12-14T17:59:45.9164539Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] interface remote_bitbang 2022-12-14T17:59:45.9166983Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] remote_bitbang_host localhost 2022-12-14T17:59:45.9168675Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] remote_bitbang_port 44853 2022-12-14T17:59:45.9171101Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] TOP.chip_sim_tb.u_dut.top_earlgrey.u_flash_ctrl.u_eflash.u_flash.gen_generic.u_impl_generic.gen_prim_flash_banks[0].u_prim_flash_bank.unnamedblk1: ReadLatency:1 ProgLatency:50 EraseLatency:200 2022-12-14T17:59:45.9173825Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] TOP.chip_sim_tb.u_dut.top_earlgrey.u_flash_ctrl.u_eflash.u_flash.gen_generic.u_impl_generic.gen_prim_flash_banks[1].u_prim_flash_bank.unnamedblk1: ReadLatency:1 ProgLatency:50 EraseLatency:200 2022-12-14T17:59:45.9176405Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] 2022-12-14T17:59:45.9177930Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] 2022-12-14T17:59:45.9179623Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] Simulation running, end by pressing CTRL-c. 2022-12-14T17:59:45.9180927Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::stdout] 2022-12-14T17:59:45.9181700Z dmi0: Failed to bind socket: Address already in use (98) 2022-12-14T17:59:45.9182298Z dmi0: Unable to create TCP server on port 44853 2022-12-14T17:59:45.9183414Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::transport] Verilator started with the following interfaces: 2022-12-14T17:59:45.9195420Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::transport] gpio_read = /root/.cache/bazel/_bazel_root/5e4f8c833cf3a422fdf6ca333dd151d7/execroot/lowrisc_opentitan/bazel-out/k8-fastbuild/bin/sw/device/silicon_creator/rom/e2e/weak_straps/sw_strap_value_sim_verilator.runfiles/lowrisc_opentitan/gpio0-read 2022-12-14T17:59:45.9197933Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::transport] gpio_write = /root/.cache/bazel/_bazel_root/5e4f8c833cf3a422fdf6ca333dd151d7/execroot/lowrisc_opentitan/bazel-out/k8-fastbuild/bin/sw/device/silicon_creator/rom/e2e/weak_straps/sw_strap_value_sim_verilator.runfiles/lowrisc_opentitan/gpio0-write 2022-12-14T17:59:45.9199910Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::transport] uart = /dev/pts/5 2022-12-14T17:59:45.9201071Z [2022-12-14T17:56:45Z INFO opentitanlib::transport::verilator::transport] spi = /dev/pts/4 2022-12-14T17:59:45.9202114Z [2022-12-14T17:56:45Z INFO opentitanlib::io::uart] set_flow_control to false 2022-12-14T17:59:45.9203168Z [2022-12-14T17:56:45Z INFO opentitanlib::test_utils::init] load_bitstream: success. 2022-12-14T17:59:45.9203825Z "load_bitstream": true 2022-12-14T17:59:45.9204680Z [2022-12-14T17:56:45Z INFO opentitanlib::test_utils::init] bootstrap: success. 2022-12-14T17:59:45.9205293Z "bootstrap": true 2022-12-14T17:59:45.9205796Z Starting test test_sw_strap_values... 2022-12-14T17:59:45.9206771Z I00000 test_rom.c:133] Version: earlgrey_silver_release_v5-9314-g398d74861, Build Date: 2022-12-14 17:06:49 2022-12-14T17:59:46.0526531Z 2022-12-14T17:59:46.0527163Z I00001 testFinished test test_sw_strap_values: Err(Timed Out 2022-12-14T17:59:46.0527604Z 2022-12-14T17:59:46.0528069Z Stack backtrace: 2022-12-14T17:59:46.0528696Z 0: anyhow::error:: for anyhow::Error>::from 2022-12-14T17:59:46.0529604Z 1: >::into 2022-12-14T17:59:46.0530351Z 2: opentitanlib::uart::console::UartConsole::wait_for 2022-12-14T17:59:46.0531025Z 3: sw_strap_value::test_sw_strap_values 2022-12-14T17:59:46.0531615Z 4: sw_strap_value::main 2022-12-14T17:59:46.0532212Z 5: core::ops::function::FnOnce::call_once 2022-12-14T17:59:46.0533273Z 6: std::sys_common::backtrace::__rust_begin_short_backtrace 2022-12-14T17:59:46.0534091Z 7: std::rt::lang_start::{{closure}} 2022-12-14T17:59:46.0534930Z 8: core::ops::function::impls:: for &F>::call_once 2022-12-14T17:59:46.0535880Z at /rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/core/src/ops/function.rs:283:13 2022-12-14T17:59:46.0536623Z 9: std::panicking::try::do_call 2022-12-14T17:59:46.0537384Z at /rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/std/src/panicking.rs:492:40 2022-12-14T17:59:46.0538059Z 10: std::panicking::try 2022-12-14T17:59:46.0538733Z at /rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/std/src/panicking.rs:456:19 2022-12-14T17:59:46.0539650Z 11: std::panic::catch_unwind 2022-12-14T17:59:46.0540348Z at /rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/std/src/panic.rs:137:14 2022-12-14T17:59:46.0541245Z 12: std::rt::lang_start_internal::{{closure}} 2022-12-14T17:59:46.0541950Z at /rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/std/src/rt.rs:148:48 2022-12-14T17:59:46.0542656Z 13: std::panicking::try::do_call 2022-12-14T17:59:46.0543343Z at /rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/std/src/panicking.rs:492:40 2022-12-14T17:59:46.0544041Z 14: std::panicking::try 2022-12-14T17:59:46.0544685Z at /rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/std/src/panicking.rs:456:19 2022-12-14T17:59:46.0545461Z 15: std::panic::catch_unwind 2022-12-14T17:59:46.0546113Z at /rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/std/src/panic.rs:137:14 2022-12-14T17:59:46.0546827Z 16: std::rt::lang_start_internal 2022-12-14T17:59:46.0547484Z at /rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/std/src/rt.rs:148:20 2022-12-14T17:59:46.0548139Z 17: std::rt::lang_start 2022-12-14T17:59:46.0548616Z 18: main 2022-12-14T17:59:46.0549085Z 19: __libc_start_main 2022-12-14T17:59:46.0549714Z 20: _start) 2022-12-14T17:59:46.0550226Z Error: Timed Out 2022-12-14T17:59:46.0550503Z 2022-12-14T17:59:46.0550882Z Stack backtrace: 2022-12-14T17:59:46.0551485Z 0: anyhow::error:: for anyhow::Error>::from 2022-12-14T17:59:46.0552163Z 1: >::into 2022-12-14T17:59:46.0552813Z 2: opentitanlib::uart::console::UartConsole::wait_for 2022-12-14T17:59:46.0553455Z 3: sw_strap_value::test_sw_strap_values 2022-12-14T17:59:46.0554032Z 4: sw_strap_value::main 2022-12-14T17:59:46.0554588Z 5: core::ops::function::FnOnce::call_once 2022-12-14T17:59:46.0555320Z 6: std::sys_common::backtrace::__rust_begin_short_backtrace 2022-12-14T17:59:46.0555949Z 7: std::rt::lang_start::{{closure}} 2022-12-14T17:59:46.0556654Z 8: core::ops::function::impls:: for &F>::call_once 2022-12-14T17:59:46.0557509Z at /rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/core/src/ops/function.rs:283:13 2022-12-14T17:59:46.0558255Z 9: std::panicking::try::do_call 2022-12-14T17:59:46.0558933Z at /rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/std/src/panicking.rs:492:40 2022-12-14T17:59:46.0559807Z 10: std::panicking::try 2022-12-14T17:59:46.0560480Z at /rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/std/src/panicking.rs:456:19 2022-12-14T17:59:46.0561306Z 11: std::panic::catch_unwind 2022-12-14T17:59:46.0561944Z at /rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/std/src/panic.rs:137:14 2022-12-14T17:59:46.0562656Z 12: std::rt::lang_start_internal::{{closure}} 2022-12-14T17:59:46.0563352Z at /rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/std/src/rt.rs:148:48 2022-12-14T17:59:46.0564044Z 13: std::panicking::try::do_call 2022-12-14T17:59:46.0564736Z at /rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/std/src/panicking.rs:492:40 2022-12-14T17:59:46.0565533Z 14: std::panicking::try 2022-12-14T17:59:46.0566194Z at /rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/std/src/panicking.rs:456:19 2022-12-14T17:59:46.0566877Z 15: std::panic::catch_unwind 2022-12-14T17:59:46.0567530Z at /rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/std/src/panic.rs:137:14 2022-12-14T17:59:46.0568209Z 16: std::rt::lang_start_internal 2022-12-14T17:59:46.0568873Z at /rustc/432abd86f231c908f6df3cdd779e83f35084be90/library/std/src/rt.rs:148:20 2022-12-14T17:59:46.0569671Z 17: std::rt::lang_start 2022-12-14T17:59:46.0570199Z 18: main 2022-12-14T17:59:46.0570646Z 19: __libc_start_main 2022-12-14T17:59:46.0571132Z 20: _start 2022-12-14T17:59:46.0571693Z ================================================================================ ```
milesdai commented 1 year ago

Tagging @cfrantz and @alphan for awareness.

drewmacrae commented 1 year ago

How long ago did we start to observe this?

drewmacrae commented 1 year ago

I only see it happening once in the last 20 runs on master.

milesdai commented 1 year ago

It was somewhat recent (i.e. in the last day or so). Here are some examples:

https://dev.azure.com/lowrisc/opentitan/_build/results?buildId=105382&view=logs&j=924ff16e-cbb4-544c-ec55-782cc11682e2&t=73634ebd-f809-5cf4-5112-b0aff4a889e0 https://dev.azure.com/lowrisc/opentitan/_build/results?buildId=105370&view=logs&j=924ff16e-cbb4-544c-ec55-782cc11682e2&t=73634ebd-f809-5cf4-5112-b0aff4a889e0 https://dev.azure.com/lowrisc/opentitan/_build/results?buildId=105464&view=logs&j=924ff16e-cbb4-544c-ec55-782cc11682e2&t=73634ebd-f809-5cf4-5112-b0aff4a889e0&l=478

alphan commented 1 year ago

We can move this to a nightly CI run.

SGTM.

drewmacrae commented 1 year ago

Hmm, #16913 and #16915 both deal with issues like this, where fully assigning cores to verilator sims starves the system of resources leading to problems at the sim-harness interface. Here it looks like the harness timed out waiting for the sim so #16913 wouldn't help, and there's a lot of overhead for something like #16915 to protect these systems in a 4 threaded environment.

drewmacrae commented 1 year ago

The nicer sim only helps if the sim times out waiting for the harness, If we wanted to run more verilator tests in CI and we see issues like this, we should consider setting --local_test_jobs=3 to leave headroom for harnesses and overhead.