ucb-bar / chipyard

An Agile RISC-V SoC Design Framework with in-order cores, out-of-order cores, accelerators, and more
https://chipyard.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
1.52k stars 605 forks source link

Segfault running SimNetwork under verilator #1885

Open sethp opened 2 months ago

sethp commented 2 months ago

Background Work

Chipyard Version and Hash

Release: N/A Hash: ef71dfd40a5c12ca489760472209c02ac59b96ca

OS Setup

+ uname -a
Linux cerf 6.6.28-1-MANJARO #1 SMP PREEMPT_DYNAMIC Wed Apr 17 13:19:22 UTC 2024 x86_64 GNU/Linux
+ lsb_release -a
LSB Version:    n/a
Distributor ID: ManjaroLinux
Description:    Manjaro Linux
Release:    23.1.4
Codename:   Vulcan
(partial) `printenv` ``` CONDA_EXE=/home/seth/miniforge3/bin/conda _CE_M= _CE_CONDA= CONDA_PYTHON_EXE=/home/seth/miniforge3/bin/python CONDA_SHLVL=2 CONDA_BACKUP_PATH=/home/seth/Code/src/github.com/ucb-bar/chipyard/software/firemarshal:/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/bin:/home/seth/Code/src/github.com/ucb-bar/chipyard/software/firemarshal:/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/riscv-tools/bin:/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/riscv-tools/bin:/home/seth/Code/src/github.com/ucb-bar/chipyard/software/firemarshal:/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/riscv-tools/bin:/home/seth/Code/src/github.com/ucb-bar/chipyard/software/firemarshal:/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/bin:/home/seth/miniforge3/condabin:/usr/bin:/home/seth/.bun/bin:/home/seth/perl5/bin:/home/seth/Code/bin:/home/seth/.cargo/bin:/home/seth/.krew/bin:/home/linuxbrew/.linuxbrew/bin:/home/linuxbrew/.linuxbrew/sbin:/home/seth/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/home/seth/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/bin:/opt/android-sdk/cmdline-tools/latest/bin:/opt/android-sdk/platform-tools:/opt/android-sdk/tools:/opt/android-sdk/tools/bin:/usr/lib/emscripten:/usr/lib/jvm/default/bin:/usr/bin/site_perl:/usr/bin/vendor_perl:/usr/bin/core_perl:/usr/lib/rustup/bin:/var/lib/snapd/snap/bin:/home/seth/Code/bin:/usr/local/kubebuilder/bin JAVA_HOME=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/lib/jvm JAVA_LD_LIBRARY_PATH=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/lib/jvm/lib/server CONDA_PREFIX=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env CONDA_DEFAULT_ENV=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env CONDA_PROMPT_MODIFIER=(/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env) CONDA_PREFIX_1=/home/seth/miniforge3 RISCV=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/riscv-tools LD_LIBRARY_PATH=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/riscv-tools/lib GSETTINGS_SCHEMA_DIR_CONDA_BACKUP= GSETTINGS_SCHEMA_DIR=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/share/glib-2.0/schemas XML_CATALOG_FILES=file:///home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/etc/xml/catalog file:///etc/xml/catalog JAVA_HOME_CONDA_BACKUP=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/lib/jvm JAVA_LD_LIBRARY_PATH_BACKUP=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/lib/jvm/lib/server _=/home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env/bin/printenv ```
`conda list` ``` # packages in environment at /home/seth/Code/src/github.com/ucb-bar/chipyard/.conda-env: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge _sysroot_linux-64_curr_repodata_hack 3 h69a702a_14 conda-forge aiohttp 3.9.3 py310h2372a71_0 conda-forge aiosignal 1.3.1 pyhd8ed1ab_0 conda-forge alabaster 0.7.16 pyhd8ed1ab_0 conda-forge alsa-lib 1.2.11 hd590300_1 conda-forge annotated-types 0.6.0 pyhd8ed1ab_0 conda-forge appdirs 1.4.4 pyh9f0ad1d_0 conda-forge archspec 0.2.3 pyhd8ed1ab_0 conda-forge argcomplete 3.2.3 pyhd8ed1ab_0 conda-forge asttokens 2.4.1 pypi_0 pypi async-timeout 4.0.3 pyhd8ed1ab_0 conda-forge atk-1.0 2.38.0 hd4edc92_1 conda-forge attrs 23.2.0 pyh71513ae_0 conda-forge autoconf 2.71 pl5321h2b4cb7a_1 conda-forge aws-c-auth 0.7.8 h538f98c_2 conda-forge aws-c-cal 0.6.9 h5d48c4d_2 conda-forge aws-c-common 0.9.10 hd590300_0 conda-forge aws-c-compression 0.2.17 h7f92143_7 conda-forge aws-c-event-stream 0.3.2 h0bcb0bb_8 conda-forge aws-c-http 0.7.14 hd268abd_3 conda-forge aws-c-io 0.13.36 he0cd244_2 conda-forge aws-c-mqtt 0.9.10 h35285c7_2 conda-forge aws-c-s3 0.4.4 h0448019_0 conda-forge aws-c-sdkutils 0.1.13 h7f92143_0 conda-forge aws-checksums 0.1.17 h7f92143_6 conda-forge aws-sam-translator 1.86.0 pyhd8ed1ab_0 conda-forge aws-xray-sdk 2.13.0 pyhd8ed1ab_0 conda-forge awscli 2.15.28 py310hff52083_0 conda-forge awscrt 0.19.19 py310h43b4219_2 conda-forge azure-core 1.30.1 pyhd8ed1ab_0 conda-forge azure-identity 1.15.0 pyhd8ed1ab_0 conda-forge babel 2.14.0 pyhd8ed1ab_0 conda-forge bash 5.2.21 h7f99829_0 conda-forge bash-completion 2.11 ha770c72_1 conda-forge bc 1.07.1 h7f98852_0 conda-forge bcrypt 4.1.2 py310hcb5633a_0 conda-forge binutils 2.40 hdd6e379_0 conda-forge binutils_impl_linux-64 2.40 hf600244_0 conda-forge bison 3.8.2 h59595ed_0 conda-forge blinker 1.7.0 pyhd8ed1ab_0 conda-forge boltons 23.1.1 pyhd8ed1ab_0 conda-forge boto3 1.34.61 pyhd8ed1ab_1 conda-forge boto3-stubs 1.34.61 pyhd8ed1ab_0 conda-forge botocore 1.34.61 pyge310_1234567_0 conda-forge botocore-stubs 1.34.61 pyhd8ed1ab_0 conda-forge brotli 1.1.0 hd590300_1 conda-forge brotli-bin 1.1.0 hd590300_1 conda-forge brotli-python 1.1.0 py310hc6cd4ac_1 conda-forge bzip2 1.0.8 hd590300_5 conda-forge c-ares 1.27.0 hd590300_0 conda-forge ca-certificates 2024.2.2 hbcca054_0 conda-forge cachecontrol 0.14.0 pyhd8ed1ab_0 conda-forge cachecontrol-with-filecache 0.14.0 pyhd8ed1ab_0 conda-forge cachy 0.3.0 pyhd8ed1ab_1 conda-forge cairo 1.18.0 h3faef2a_0 conda-forge certifi 2024.2.2 pyhd8ed1ab_0 conda-forge cffi 1.16.0 py310h2fee648_0 conda-forge cfgv 3.3.1 pyhd8ed1ab_0 conda-forge cfn-lint 0.86.0 pyhd8ed1ab_0 conda-forge charset-normalizer 3.3.2 pyhd8ed1ab_0 conda-forge clang-format 17.0.6 default_hb11cfb5_3 conda-forge clang-format-17 17.0.6 default_hb11cfb5_3 conda-forge clang-tools 17.0.6 default_hb11cfb5_3 conda-forge click 8.1.7 unix_pyh707e725_0 conda-forge click-default-group 1.2.4 pyhd8ed1ab_0 conda-forge clikit 0.6.2 pyhd8ed1ab_2 conda-forge cloudpickle 3.0.0 pyhd8ed1ab_0 conda-forge cmake 3.26.3 h077f3f9_0 conda-forge colorama 0.4.6 pyhd8ed1ab_0 conda-forge conda 23.9.0 py310hff52083_2 conda-forge conda-gcc-specs 13.2.0 h6a59387_5 conda-forge conda-lock 1.4.0 pyhd8ed1ab_2 conda-forge conda-package-handling 2.2.0 pyh38be061_0 conda-forge conda-package-streaming 0.9.0 pyhd8ed1ab_0 conda-forge conda-standalone 24.1.2 ha770c72_0 conda-forge conda-tree 1.1.0 pyhd8ed1ab_2 conda-forge constructor 3.7.0 pyh55f8243_0 conda-forge contourpy 1.2.0 py310hd41b1e2_0 conda-forge coreutils 9.4 hd590300_0 conda-forge crashtest 0.4.1 pyhd8ed1ab_0 conda-forge cryptography 40.0.2 py310h34c0648_0 conda-forge ctags 5.8 h14c3975_1000 conda-forge curl 7.88.1 hdc1c0ab_1 conda-forge cycler 0.12.1 pyhd8ed1ab_0 conda-forge dbus 1.13.6 h5008d03_3 conda-forge diffutils 3.10 hf18258e_0 conda-forge distlib 0.3.8 pyhd8ed1ab_0 conda-forge distro 1.8.0 pyhd8ed1ab_0 conda-forge docker-py 7.0.0 pyhd8ed1ab_0 conda-forge docutils 0.19 py310hff52083_1 conda-forge doit 0.36.0 pyhd8ed1ab_0 conda-forge dtc 1.6.1 h166bdaf_2 conda-forge ecdsa 0.18.0 pyhd8ed1ab_1 conda-forge elfutils 0.187 h989201e_0 conda-forge ensureconda 1.4.4 pyhd8ed1ab_0 conda-forge exceptiongroup 1.2.0 pyhd8ed1ab_2 conda-forge expat 2.6.1 h59595ed_0 conda-forge expect 5.45.4 h555a92e_0 conda-forge fab-classic 1.19.2 pypi_0 pypi file 5.39 h753d276_1 conda-forge filelock 3.13.1 pyhd8ed1ab_0 conda-forge findutils 4.6.0 h166bdaf_1001 conda-forge flask 3.0.2 pyhd8ed1ab_0 conda-forge flask_cors 3.0.10 pyhd3deb0d_0 conda-forge flex 2.6.4 h58526e2_1004 conda-forge font-ttf-dejavu-sans-mono 2.37 hab24e00_0 conda-forge font-ttf-inconsolata 3.000 h77eed37_0 conda-forge font-ttf-source-code-pro 2.038 h77eed37_0 conda-forge font-ttf-ubuntu 0.83 h77eed37_1 conda-forge fontconfig 2.14.2 h14ed4e7_0 conda-forge fonts-conda-ecosystem 1 0 conda-forge fonts-conda-forge 1 0 conda-forge fonttools 4.49.0 py310h2372a71_0 conda-forge freetype 2.12.1 h267a509_2 conda-forge fribidi 1.0.10 h36c2ea0_0 conda-forge frozenlist 1.4.1 py310h2372a71_0 conda-forge fsspec 2024.2.0 pyhca7485f_0 conda-forge gcc 13.2.0 hd6cf55c_3 conda-forge gcc_impl_linux-64 13.2.0 h338b0a0_5 conda-forge gdk-pixbuf 2.42.10 h829c605_5 conda-forge gdspy 1.4 pypi_0 pypi gengetopt 2.23 h9c3ff4c_0 conda-forge gettext 0.21.1 h27087fc_0 conda-forge giflib 5.2.1 h0b41bf4_3 conda-forge git 2.44.0 pl5321h709897a_0 conda-forge gitdb 4.0.11 pyhd8ed1ab_0 conda-forge gitpython 3.1.42 pyhd8ed1ab_0 conda-forge gmp 6.3.0 h59595ed_1 conda-forge gmpy2 2.1.2 py310h3ec546c_1 conda-forge gnutls 3.7.9 hb077bed_0 conda-forge graphite2 1.3.13 h58526e2_1001 conda-forge graphql-core 3.2.3 pyhd8ed1ab_0 conda-forge graphviz 9.0.0 h78e8752_1 conda-forge gtk2 2.24.33 h280cfa0_4 conda-forge gts 0.7.6 h977cf35_4 conda-forge gxx 13.2.0 hd6cf55c_3 conda-forge gxx_impl_linux-64 13.2.0 h338b0a0_5 conda-forge gzip 1.13 hd590300_0 conda-forge hammer-vlsi 1.2.0 pypi_0 pypi harfbuzz 8.3.0 h3d44ed6_0 conda-forge html5lib 1.1 pyh9f0ad1d_0 conda-forge humanfriendly 10.0 pyhd8ed1ab_6 conda-forge icontract 2.6.6 pypi_0 pypi icu 73.2 h59595ed_0 conda-forge identify 2.5.35 pyhd8ed1ab_0 conda-forge idna 3.6 pyhd8ed1ab_0 conda-forge imagesize 1.4.1 pyhd8ed1ab_0 conda-forge importlib-metadata 7.0.2 pyha770c72_0 conda-forge importlib_metadata 7.0.2 hd8ed1ab_0 conda-forge importlib_resources 6.3.0 pyhd8ed1ab_0 conda-forge iniconfig 2.0.0 pyhd8ed1ab_0 conda-forge itsdangerous 2.1.2 pyhd8ed1ab_0 conda-forge jaraco.classes 3.3.1 pyhd8ed1ab_0 conda-forge jeepney 0.8.0 pyhd8ed1ab_0 conda-forge jinja2 3.1.3 pyhd8ed1ab_0 conda-forge jmespath 1.0.1 pyhd8ed1ab_0 conda-forge joserfc 0.9.0 pyhd8ed1ab_0 conda-forge jq 1.7.1 hd590300_0 conda-forge jschema-to-python 1.2.3 pyhd8ed1ab_0 conda-forge jsondiff 2.0.0 pyhd8ed1ab_0 conda-forge jsonpatch 1.33 pyhd8ed1ab_0 conda-forge jsonpickle 3.0.2 pyhd8ed1ab_1 conda-forge jsonpointer 2.4 py310hff52083_3 conda-forge jsonschema 4.21.1 pyhd8ed1ab_0 conda-forge jsonschema-path 0.3.2 pyhd8ed1ab_0 conda-forge jsonschema-specifications 2023.7.1 pyhd8ed1ab_0 conda-forge junit-xml 1.9 pyh9f0ad1d_0 conda-forge kernel-headers_linux-64 3.10.0 h4a8ded7_14 conda-forge keyring 24.3.1 py310hff52083_0 conda-forge keyutils 1.6.1 h166bdaf_0 conda-forge kiwisolver 1.4.5 py310hd41b1e2_1 conda-forge krb5 1.20.1 h81ceb04_0 conda-forge lazy-object-proxy 1.10.0 py310h2372a71_0 conda-forge lcms2 2.16 hb7c19ff_0 conda-forge ld_impl_linux-64 2.40 h41732ed_0 conda-forge lerc 4.0.0 h27087fc_0 conda-forge libabseil 20240116.1 cxx17_h59595ed_2 conda-forge libarchive 3.5.2 hada088e_3 conda-forge libblas 3.9.0 21_linux64_openblas conda-forge libbrotlicommon 1.1.0 hd590300_1 conda-forge libbrotlidec 1.1.0 hd590300_1 conda-forge libbrotlienc 1.1.0 hd590300_1 conda-forge libcblas 3.9.0 21_linux64_openblas conda-forge libclang-cpp17 17.0.6 default_hb11cfb5_3 conda-forge libclang13 17.0.6 default_ha2b6cf4_3 conda-forge libcups 2.3.3 h36d4200_3 conda-forge libcurl 7.88.1 hdc1c0ab_1 conda-forge libdeflate 1.19 hd590300_0 conda-forge libdwarf 0.0.0.20190110_28_ga81397fc4 h753d276_0 ucb-bar libdwarf-dev 0.0.0.20190110_28_ga81397fc4 h753d276_0 ucb-bar libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 hd590300_2 conda-forge libexpat 2.6.1 h59595ed_0 conda-forge libfdt 1.6.1 h166bdaf_2 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-devel_linux-64 13.2.0 ha9c7c90_105 conda-forge libgcc-ng 13.2.0 h807b86a_5 conda-forge libgcrypt 1.10.3 hd590300_0 conda-forge libgd 2.3.3 h119a65a_9 conda-forge libgfortran-ng 13.2.0 h69a702a_5 conda-forge libgfortran5 13.2.0 ha4646dd_5 conda-forge libgirepository 1.78.1 h003a4f0_1 conda-forge libglib 2.80.0 hf2295e7_0 conda-forge libgomp 13.2.0 h807b86a_5 conda-forge libgpg-error 1.48 h71f35ed_0 conda-forge libiconv 1.17 hd590300_2 conda-forge libidn2 2.3.7 hd590300_0 conda-forge libjpeg-turbo 3.0.0 hd590300_1 conda-forge liblapack 3.9.0 21_linux64_openblas conda-forge libllvm17 17.0.6 hb3ce162_1 conda-forge libmagic 5.39 h753d276_1 conda-forge libmicrohttpd 0.9.77 h97afed2_0 conda-forge libnghttp2 1.58.0 h47da74e_1 conda-forge libnsl 2.0.1 hd590300_0 conda-forge libopenblas 0.3.26 pthreads_h413a1c8_0 conda-forge libpng 1.6.43 h2797004_0 conda-forge libprotobuf 4.25.3 h08a7969_0 conda-forge librsvg 2.56.3 he3f83f7_1 conda-forge libsanitizer 13.2.0 h7e041cc_5 conda-forge libsecret 0.18.8 h329b89f_2 conda-forge libsodium 1.0.18 h36c2ea0_1 conda-forge libsqlite 3.45.2 h2797004_0 conda-forge libssh2 1.11.0 h0841786_0 conda-forge libstdcxx-devel_linux-64 13.2.0 ha9c7c90_105 conda-forge libstdcxx-ng 13.2.0 h7e041cc_5 conda-forge libtasn1 4.19.0 h166bdaf_0 conda-forge libtiff 4.6.0 ha9c0a0a_2 conda-forge libunistring 0.9.10 h7f98852_0 conda-forge libusb1 2.0.1 pyhd8ed1ab_0 conda-forge libuuid 2.38.1 h0b41bf4_0 conda-forge libuv 1.48.0 hd590300_0 conda-forge libwebp 1.3.2 h658648e_1 conda-forge libwebp-base 1.3.2 hd590300_0 conda-forge libxcb 1.15 h0b41bf4_0 conda-forge libxcrypt 4.4.36 hd590300_1 conda-forge libxml2 2.12.5 h232c23b_0 conda-forge libzlib 1.2.13 hd590300_5 conda-forge livereload 2.6.3 pyh9f0ad1d_0 conda-forge lz4-c 1.9.4 hcb278e6_0 conda-forge lzo 2.10 h516909a_1000 conda-forge lzop 1.04 h3753786_2 conda-forge m4 1.4.18 h516909a_1001 conda-forge make 4.3 hd18ef5c_1 conda-forge markupsafe 2.1.5 py310h2372a71_0 conda-forge matplotlib-base 3.8.3 py310h62c0568_0 conda-forge mock 5.1.0 pypi_0 pypi more-itertools 10.2.0 pyhd8ed1ab_0 conda-forge mosh 1.4.0 pl5321h7cc048c_8 conda-forge moto 5.0.3 pyhd8ed1ab_0 conda-forge mpc 1.3.1 hfe3b2da_0 conda-forge mpfr 4.2.1 h9458935_0 conda-forge mpmath 1.3.0 pyhd8ed1ab_0 conda-forge msal 1.27.0 pyhd8ed1ab_0 conda-forge msal_extensions 1.1.0 py310hff52083_1 conda-forge msgpack-python 1.0.7 py310hd41b1e2_0 conda-forge multidict 6.0.5 py310h2372a71_0 conda-forge munkres 1.1.4 pyh9f0ad1d_0 conda-forge mypy 1.9.0 py310h2372a71_0 conda-forge mypy-boto3-s3 1.34.14 pyhd8ed1ab_0 conda-forge mypy_boto3_ec2 1.34.61 pyhd8ed1ab_0 conda-forge mypy_extensions 1.0.0 pyha770c72_0 conda-forge ncurses 6.4 h59595ed_2 conda-forge nettle 3.9.1 h7ab15ed_0 conda-forge networkx 3.2.1 pyhd8ed1ab_0 conda-forge nodeenv 1.8.0 pyhd8ed1ab_0 conda-forge numpy 1.26.4 py310hb13e2d6_0 conda-forge oniguruma 6.9.9 hd590300_0 conda-forge open_pdks.sky130a 1.0.471_0_g97d0844 20240223_100318 litex-hub openapi-schema-validator 0.6.2 pyhd8ed1ab_0 conda-forge openapi-spec-validator 0.7.1 pyhd8ed1ab_0 conda-forge openjdk 20.0.2 haa376d0_2 conda-forge openjpeg 2.5.2 h488ebb8_0 conda-forge openssl 3.2.1 hd590300_0 conda-forge p11-kit 0.24.1 hc5aa10d_0 conda-forge packaging 24.0 pyhd8ed1ab_0 conda-forge pandas 2.2.1 py310hcc13569_0 conda-forge pango 1.52.1 ha41ecd1_0 conda-forge paramiko 3.4.0 pyhd8ed1ab_0 conda-forge paramiko-ng 2.8.10 pypi_0 pypi pastel 0.2.1 pyhd8ed1ab_0 conda-forge patch 2.7.6 h7f98852_1002 conda-forge pathable 0.4.3 pyhd8ed1ab_0 conda-forge pbr 6.0.0 pyhd8ed1ab_0 conda-forge pcre2 10.43 hcad00b1_0 conda-forge perl 5.32.1 7_hd590300_perl5 conda-forge pillow 10.2.0 py310h01dd4db_0 conda-forge pip 24.0 pyhd8ed1ab_0 conda-forge pixman 0.43.2 h59595ed_0 conda-forge pkginfo 1.10.0 pyhd8ed1ab_0 conda-forge pkgutil-resolve-name 1.3.10 pyhd8ed1ab_1 conda-forge platformdirs 4.2.0 pyhd8ed1ab_0 conda-forge pluggy 1.4.0 pyhd8ed1ab_0 conda-forge popt 1.16 h0b475e3_2002 conda-forge portalocker 2.8.2 py310hff52083_1 conda-forge pre-commit 3.6.2 pyha770c72_0 conda-forge prompt-toolkit 3.0.38 pyha770c72_0 conda-forge prompt_toolkit 3.0.38 hd8ed1ab_0 conda-forge psutil 5.9.8 py310h2372a71_0 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge pyasn1 0.5.1 pyhd8ed1ab_0 conda-forge pycairo 1.26.0 py310hda9f760_0 conda-forge pycosat 0.6.6 py310h2372a71_0 conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pydantic 1.10.14 pypi_0 pypi pydantic-core 2.16.3 py310hcb5633a_0 conda-forge pygments 2.17.2 pyhd8ed1ab_0 conda-forge pygobject 3.48.1 py310h30b043a_0 conda-forge pyjwt 2.8.0 pyhd8ed1ab_1 conda-forge pylddwrap 1.2.2 pypi_0 pypi pylev 1.4.0 pyhd8ed1ab_0 conda-forge pynacl 1.5.0 py310h2372a71_3 conda-forge pyopenssl 23.1.1 pyhd8ed1ab_0 conda-forge pyparsing 3.1.2 pyhd8ed1ab_0 conda-forge pysocks 1.7.1 pyha2e5f31_6 conda-forge pytest 8.1.1 pyhd8ed1ab_0 conda-forge pytest-dependency 0.5.1 pyh9f0ad1d_0 conda-forge pytest-mock 3.12.0 pyhd8ed1ab_0 conda-forge python 3.10.13 hd12c33a_1_cpython conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python-graphviz 0.20.1 pyh22cad53_0 conda-forge python-jose 3.3.0 pyh6c4a22f_1 conda-forge python-tzdata 2024.1 pyhd8ed1ab_0 conda-forge python_abi 3.10 4_cp310 conda-forge pytz 2024.1 pyhd8ed1ab_0 conda-forge pywin32-on-windows 0.1.0 pyh1179c8e_3 conda-forge pyyaml 6.0.1 py310h2372a71_1 conda-forge qemu 5.0.0 hb15d774_0 ucb-bar readline 8.2 h8228510_1 conda-forge referencing 0.30.2 pyhd8ed1ab_0 conda-forge regex 2023.12.25 py310h2372a71_0 conda-forge requests 2.31.0 pyhd8ed1ab_0 conda-forge responses 0.25.0 pyhd8ed1ab_0 conda-forge rfc3339-validator 0.1.4 pyhd8ed1ab_0 conda-forge rhash 1.4.3 hd590300_2 conda-forge riscv-tools 1.0.6 0_h1234567_g56c29e0 ucb-bar rpds-py 0.18.0 py310hcb5633a_0 conda-forge rsa 4.9 pyhd8ed1ab_0 conda-forge rsync 3.2.7 h70740c4_0 conda-forge ruamel-yaml 0.17.40 pypi_0 pypi ruamel.yaml.clib 0.2.7 py310h2372a71_2 conda-forge s2n 1.4.0 h06160fa_0 conda-forge s3fs 0.4.2 py_0 conda-forge s3transfer 0.10.0 pyhd8ed1ab_0 conda-forge sarif-om 1.0.4 pyhd8ed1ab_0 conda-forge sbt 1.9.7 hd8ed1ab_0 conda-forge screen 4.8.0 he28a2e2_0 conda-forge secretstorage 3.3.3 py310hff52083_2 conda-forge sed 4.8 he412f7d_0 conda-forge setuptools 69.2.0 pyhd8ed1ab_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge smmap 5.0.0 pyhd8ed1ab_0 conda-forge snowballstemmer 2.2.0 pyhd8ed1ab_0 conda-forge sphinx 7.2.6 pyhd8ed1ab_0 conda-forge sphinx-autobuild 2024.2.4 pyhd8ed1ab_0 conda-forge sphinx_rtd_theme 2.0.0 pyha770c72_0 conda-forge sphinxcontrib-applehelp 1.0.8 pyhd8ed1ab_0 conda-forge sphinxcontrib-devhelp 1.0.6 pyhd8ed1ab_0 conda-forge sphinxcontrib-htmlhelp 2.0.5 pyhd8ed1ab_0 conda-forge sphinxcontrib-jquery 4.1 pyhd8ed1ab_0 conda-forge sphinxcontrib-jsmath 1.0.1 pyhd8ed1ab_0 conda-forge sphinxcontrib-qthelp 1.0.7 pyhd8ed1ab_0 conda-forge sphinxcontrib-serializinghtml 1.1.10 pyhd8ed1ab_0 conda-forge sqlite 3.45.2 h2c6b66d_0 conda-forge sshpubkeys 3.3.1 pyhd8ed1ab_0 conda-forge sty 1.0.0 pyhd8ed1ab_0 conda-forge sure 2.0.1 pypi_0 pypi sympy 1.12 pypyh9d50eac_103 conda-forge sysroot_linux-64 2.17 h4a8ded7_14 conda-forge tar 1.34 hb2e2bae_1 conda-forge tk 8.6.13 noxft_h4845f30_101 conda-forge tomli 2.0.1 pyhd8ed1ab_0 conda-forge tomlkit 0.12.4 pyha770c72_0 conda-forge toolz 0.12.1 pyhd8ed1ab_0 conda-forge tornado 6.4 py310h2372a71_0 conda-forge tqdm 4.66.2 pyhd8ed1ab_0 conda-forge truststore 0.8.0 pyhd8ed1ab_0 conda-forge types-awscrt 0.20.5 pyhd8ed1ab_0 conda-forge types-pytz 2024.1.0.20240203 pyhd8ed1ab_0 conda-forge types-pyyaml 6.0.12.20240311 pyhd8ed1ab_0 conda-forge types-requests 2.31.0.6 pyhd8ed1ab_0 conda-forge types-s3transfer 0.10.0 pypi_0 pypi types-urllib3 1.26.25.14 pyhd8ed1ab_0 conda-forge typing-extensions 4.10.0 hd8ed1ab_0 conda-forge typing_extensions 4.10.0 pyha770c72_0 conda-forge tzdata 2024a h0c530f3_0 conda-forge ukkonen 1.0.1 py310hd41b1e2_4 conda-forge unicodedata2 15.1.0 py310h2372a71_0 conda-forge unzip 6.0 h7f98852_3 conda-forge urllib3 1.26.18 pyhd8ed1ab_0 conda-forge verilator 5.022 h7cd9344_0 conda-forge vim 9.1.0041 py310pl5321he660f0e_0 conda-forge virtualenv 20.25.1 pyhd8ed1ab_0 conda-forge wcwidth 0.2.13 pyhd8ed1ab_0 conda-forge webencodings 0.5.1 pyhd8ed1ab_2 conda-forge websocket-client 1.7.0 pyhd8ed1ab_0 conda-forge werkzeug 3.0.1 pyhd8ed1ab_0 conda-forge wget 1.20.3 ha35d2d1_1 conda-forge wheel 0.42.0 pyhd8ed1ab_0 conda-forge which 2.21 h0b41bf4_1 conda-forge wrapt 1.16.0 py310h2372a71_0 conda-forge xmltodict 0.13.0 pyhd8ed1ab_0 conda-forge xorg-fixesproto 5.0 h7f98852_1002 conda-forge xorg-inputproto 2.3.2 h7f98852_1002 conda-forge xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.1.1 hd590300_0 conda-forge xorg-libsm 1.2.4 h7391055_0 conda-forge xorg-libx11 1.8.7 h8ee46fc_0 conda-forge xorg-libxau 1.0.11 hd590300_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h0b41bf4_2 conda-forge xorg-libxfixes 5.0.3 h7f98852_1004 conda-forge xorg-libxi 1.7.10 h7f98852_0 conda-forge xorg-libxrender 0.9.11 hd590300_0 conda-forge xorg-libxt 1.3.0 hd590300_1 conda-forge xorg-libxtst 1.2.3 h7f98852_1002 conda-forge xorg-recordproto 1.14.2 h7f98852_1002 conda-forge xorg-renderproto 0.11.1 h7f98852_1002 conda-forge xorg-xextproto 7.3.0 h0b41bf4_1003 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xxhash 0.8.0 h7f98852_3 conda-forge xz 5.2.6 h166bdaf_0 conda-forge yaml 0.2.5 h7f98852_2 conda-forge yarl 1.9.4 py310h2372a71_0 conda-forge zipp 3.17.0 pyhd8ed1ab_0 conda-forge zlib 1.2.13 hd590300_5 conda-forge zstandard 0.22.0 py310h1275a96_0 conda-forge zstd 1.5.5 hfc55251_0 conda-forge ```

Other Setup

Followed the "setting up the repository" guide, and added this to PeripheralDeviceConfigs.scala:

class TapNICRocketConfig extends Config(
  new chipyard.harness.WithSimNetwork ++
  new icenet.WithIceNIC ++
  new freechips.rocketchip.subsystem.WithNBigCores(1) ++
  new chipyard.config.AbstractConfig)

Current Behavior

When I run: make -C sims/verilator CONFIG=TapNICRocketConfig VERILATOR_THREADS=1 with any number of threads larger than 1, I end up with a simulator program that dies with a SIGSEGV almost as soon as I can launch it.

As a consequence (and, to address what I'm really after), running the pingd.c test results in a ~2s ping on my system. That's longer than the default interval of a second, which means that running ping with no arguments against the single-threaded RTL simulation is an effective DOS strategy as it sends ICMP echo requests at about 2x the throughput the simulator can maintain.

Expected Behavior

Simply, I expected to be able to "throw more threads at it," as most of the time seems to be going to front-end stalls due to icache misses, something that multiple threads addresses nicely by way of expanding the effective available icache space.

More broadly, I suppose I expected there to be a way to get to workable performance of the RTL model for functional simulation of simple network nodes without custom hardware, proprietary software, or an FPGA-equipped cloud instance. Perhaps my expectation that such a path exists through verilator is worth discussing here, too?

Other Information

coredumpctl debug suggests this is because the thread-local context_t isn't fully initialized in all threads:

Core was generated by `/home/seth/Code/src/github.com/ucb-bar/chipyard/sims/verilator/simulator-chipya'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000056067f92ac60 in context_t::switch_to (this=0x56068194f370) at ../fesvr/context.cc:86
86        cur = this;
[Current thread is 1 (Thread 0x7fece98bb6c0 (LWP 4017918))]
(gdb) bt
#0  0x000056067f92ac60 in context_t::switch_to (this=0x56068194f370) at ../fesvr/context.cc:86
#1  0x000056067f3dda89 in network_tick ()
#2  0x000056067f555b96 in VTestDriver___024unit____Vdpiimwrap_network_tick_TOP____024unit(unsigned char, unsigned long, unsigned char, unsigned char&, unsigned long&, unsigned char&, unsigned long&) ()
#3  0x000056067f739e90 in VTestDriver___024root___nba_sequent__TOP__1899(VTestDriver___024root*) ()
#4  0x000056067f44d8b6 in VTestDriver___024root____Vthread__nba__2(void*, bool) ()
#5  0x000056067f415cb1 in VlWorkerThread::workerLoop() ()
#6  0x00007feceb8f0e95 in std::execute_native_thread_routine (__p=<optimized out>)
    at ../../../../../libstdc++-v3/src/c++11/thread.cc:104
#7  0x00007feceb5da55a in ?? () from /usr/lib/libc.so.6
#8  0x00007feceb657a5c in ?? () from /usr/lib/libc.so.6
(gdb) p this
$1 = (context_t * const) 0x56068194f370
(gdb) p cur
$2 = (context_t *) 0x0

I'm not familiar with the usercontext patterns/types used by the fesvr to implement what appears to be "green threads" (really, co-routines), but I do see in the verilator docs that in multithreaded verilator it's the verilated model which creates and manages all N-1 threads except for whatever called eval:

With --threads {N}, where N is at least 2, the generated model will be designed to run in parallel on N threads. The thread calling eval() provides one of those threads, and the generated model will create and manage the other N-1 threads. ... When making frequent use of DPI imported functions in a multithreaded model, it may be beneficial to performance to adjust the --instr-count-dpi option based on some experimentation. This influences the partitioning of the model by adjusting the assumed execution time of DPI imports.

And the latter bit suggests to me that while under some conditions the DPI functions may be called from the same threads, I see no guarantees that DPI from the always blocks in a module will be called using the same thread as the initial blocks (which the current implementation implicitly assumes). I suspect that one more degree of refinement around "the model's threads ought to be more compatible with the fesvr's coroutine implementation backing the DPI SimNetwork implementation" would help here, not least in identifying whether this is even a crash that chipyard itself has any leverage over.

I'm opening this here because I believe it's the right home for this issue, since it seems that even if verilator or fesvr exposed a callback or config option that would affect the outcome there'd still need to be a change here in chipyard to take advantage of it, but I'm very new to this space and would welcome your guidance.

sethp commented 2 months ago

Oh, I also meant to mention: I found someone on the mailing list with a similar-looking symptom ( https://groups.google.com/g/chipyard/c/i0pNR4t8HFA/m/NBMP4fcsAQAJ ), but given that they reported 1) the crash occurred in what appears to be an initial block rather than the nba driving the network_tick, and 2) solving the problem by changing the name of the connected bus I suspect that is a different issue, though perhaps still somewhere in SimNetwork.cc & friends.

jerryz123 commented 2 months ago

I wonder if adding a lock to the network__tick and network_init functions would be sufficient.

sethp commented 2 months ago

I think it depends on how you mean: I noticed the cur that's 0x0 in gdb there is a static __thread context_t* cur;; since it's a thread-local, I read that as saying the faulting thread's copy of that storage is uninitialized. No other thread can (well, should) access the faulting thread's local storage, so locking to wait for it to be constructed would probably just hang.

If you mean "instead of using a ucontext/coroutine thing, set up a cond-with-lock in network_init that parks a pthread until signaled by network_tick", then I think there's a potentially fruitful path there: there's even a bit of an example implementation in the fesvr's context_t as an alternative to using a ucontext, albeit one that doesn't look directly usable.

Between repair and replace, I'm personally leaning towards trying to figure out what the fesvr's context_t thing wants, since there's a handful of other uses of in chipyard already (e.g. the spike tile) that would either suffer from a similar issue or possibly provide a solution. I'm hoping to continue posting my notes here as I learn my way through the fesvr and verilator threading models—unless you'd rather I didn't, of course!

jerryz123 commented 1 month ago

unless you'd rather I didn't, of course!

I suspect the solution to the problem does not require messing around in context_t. I believe several other devices uses FESVR's context_t and behave correctly in multithreaded sims.

sethp commented 1 month ago

Perhaps! I did see that other devices made use of context_t, which is what leads me to want to understand the problem a little better. I found certain --threads counts do in fact produce a simulation that works for a given model using SimNetwork; not just 1 (which always works), but sometimes the model will work indefinitely with --threads 2 and crash immediately with --threads 3.

Two especially relevant details I've noticed so far:

  1. The verilator multithreading model appears to schedule micro-tasks statically; i.e. the same thread always resolves the same DPI-C call for a given model
  2. The crash occurs inside context_t when the thread-local cur variable is NULL (0x0). cur usually looks like it gets initialized as a side-effect of calling init in that thread (for any context_t instance, I think?)

I think it's the combination of these two that's causing the behavior I'm seeing: when the scheduler happens to place the network_init and network_tick DPI calls into the same thread's work queue (P=100% with one thread, ~50% with two, ~33 % with three, etc...), then there's no crash—network_init populates the thread-local, and network_tick uses it.

If I'm further right in saying that any call to static context_t::current() in a thread "pre-warms" it, then adding more independently-scheduled instances (like, say, 2x ice nics + a block device + 8 spike tiles), we might end up rapidly (but asymptotically) approaching a 100% chance that some initial block populates each thread's cur storage for a given number of threads. Which could account for (directly, or indirectly) your experience that multi-threaded sims with context_t work fine?

sethp commented 1 month ago

Hmm, well, yes and no to that last question. Adding a spike tile[^1] did perturb the scheduler enough that the simulation worked at least once[^2] with VERILATOR_THREADS=3:

[UART] UART0 is here (stdin/stdout).
network init (tid=198488)
No tap interface provided
Constructing spike processor_t (tid=198490)
Done constructing spike processor
network tick (tid=198488)
- /home/seth/Code/src/github.com/ucb-bar/chipyard/sims/verilator/generated-src/chipyard.harness.TestHarness.TapNICRocketConfig/gen-collateral/TestDriver.v:158: Verilog $finish

and failed with VERILATOR_THREADS=4:

network init (tid=205548)
No tap interface provided
Constructing spike processor_t (tid=205550)
Done constructing spike processor
network tick (tid=205551)
zsh: segmentation fault (core dumped)  ./sims/verilator/simulator-chipyard.harness-TapNICRocketConfig 

But, adding/removing cores doesn't change which threads do the initialization:

Constructing spike processor_t (tid=219082)
Done constructing spike processor
Constructing spike processor_t (tid=219082)
Done constructing spike processor
Constructing spike processor_t (tid=219082)
Done constructing spike processor
Constructing spike processor_t (tid=219082)
Done constructing spike processor

And, experimenting also provided a counterexample to my speculation that any context_t::init caller in a thread would suffice:

[UART] UART0 is here (stdin/stdout).
network init (tid=192456)
No tap interface provided
Constructing spike processor_t (tid=192457)
Done constructing spike processor
network tick (tid=192457)
zsh: segmentation fault (core dumped)  ./sims/verilator/simulator-chipyard.harness-TapNICRocketConfig 

Also, it seems that bdev is vulnerable to the same crash (as long as the sim is run with +blkdev=somefile, otherwise the blkdev never inits or ticks):

bdev init (tid=312407)
[UART] UART0 is here (stdin/stdout).
...
bdev tick (tid=312410)
zsh: segmentation fault (core dumped)  ./sims/verilator/simulator-chipyard.harness-TapNICRocketConfig +permissive   
$ coredumpctl debug
...
Core was generated by `./sims/verilator/simulator-chipyard.harness-TapNICRocketConfig +permissive +blk'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000556201ef4530 in context_t::switch_to (this=0x55620306f9e0) at ../fesvr/context.cc:86
86        cur = this;
[Current thread is 1 (Thread 0x7fcebc80c6c0 (LWP 312410))]
(gdb) bt
#0  0x0000556201ef4530 in context_t::switch_to (this=0x55620306f9e0) at ../fesvr/context.cc:86
#1  0x00005562015cecbe in block_device_tick ()
#2  0x00005562017a7206 in VTestDriver___024unit____Vdpiimwrap_block_device_tick_TOP____024unit(unsigned char, unsigned char&, unsigned char, unsigned int, unsigned int, unsigned int, unsigned char, unsigned char&, unsigned long, unsigned int, unsigned char&, unsigned char, unsigned long&, unsigned int&) ()
#3  0x00005562019a38b0 in VTestDriver___024root___nba_sequent__TOP__1888(VTestDriver___024root*) ()
#4  0x0000556201646d0f in VTestDriver___024root____Vthread__nba__2(void*, bool) ()
#5  0x000055620160ce51 in VlWorkerThread::workerLoop() ()
#6  0x00007fcebe86fe95 in std::execute_native_thread_routine (__p=<optimized out>)
    at ../../../../../libstdc++-v3/src/c++11/thread.cc:104
#7  0x00007fcebe53e55a in ?? () from /usr/lib/libc.so.6
#8  0x00007fcebe5bba5c in ?? () from /usr/lib/libc.so.6
(gdb) p cur
$1 = (context_t *) 0x0

It seems like the spike tile is in fact the odd one out; since it's driven from a single DPI-C entrypoint that inits itself on demand, it's not possible(?) for it to be scheduled on to two different threads by the Verilator microtask scheduler.

[^1]: so the new config is

    class TapNICRocketConfig extends Config(
    new chipyard.WithNSpikeCores(4) ++

    new chipyard.harness.WithSimNetwork ++
    new icenet.WithIceNIC ++
    new freechips.rocketchip.subsystem.WithNBigCores(1) ++
    new chipyard.config.AbstractConfig)

[^2]: I'm testing with:

    touch sims/verilator/generated-src/chipyard.harness.TestHarness.TapNICRocketConfig/gen-collateral/filelist.f && MAKEFLAGS=-j`nproc` make -C sims/verilator VERILATOR_THREADS=N CONFIG=TapNICRocketConfig && ./sims/verilator/simulator-chipyard.harness-TapNICRocketConfig toolchains/riscv-tools/riscv-tests/build/isa/rv64ui-p-simple
for various values of N. I'm also using `rv64ui-p-simple` because it doesn't seem important which test program to use: whether it segfaults or not happens on the first `network_tick` call and is apparently independent of whatever the simulated binary does.
sethp commented 1 month ago

Ah, one speculative code change later and, oh, hello:

[UART] UART0 is here (stdin/stdout).
network init (tid=536361)
No tap interface provided
network tick (tid=536364)
- /home/seth/Code/src/github.com/ucb-bar/chipyard/sims/verilator/generated-src/chipyard.harness.TestHarness.TapNICRocketConfig/gen-collateral/TestDriver.v:158: Verilog $finish

init and tick ran in different OS threads, and yet no crash! That result via noticing that it was a very different crash in the case that someone had previously populated the thread-local.

In reading the code to try and identify how the control flow ought to return, I noticed that both the device and the switch had a pointer to another thread's local storage (they were both storing cur in a field). So, overriding some access modifiers so I could update them from network_tick:

  if (!netdev || !netsw) {
    fprintf(stderr, "You forgot to call network_init!");
    exit(1);
  }

+  netdev->target = context_t::current();
+  netsw->main = context_t::current();

  netdev->tick(out_valid, out_data, out_last);
  netdev->switch_to_host();

  netsw->distribute();
  netsw->switch_to_worker();

and, since context_t::current exists to populate the thread-local, it seems neither crash occurred.

I'm still not quite sure what all this means yet, nor what course of action it suggests, but I did find that result very interesting.

sethp commented 1 month ago

Ok, so I've experimented a bit more, and I've come up with three potentially useful perspectives here:

I'd be lying if I said I didn't see the last as the most pragmatic option. I also feel like it's a bit of of a loss: I've really enjoyed learning about the ucontext_t stuff, and I appreciate the utility of being able to multiplex lightweight user tasks over the same thread. That said, I spend a lot of time learning about weird corners of computing, and it was still very strange to me on first encounter. That to me is an important signal about code accessibility, and for the same reason non-local control flow is... well, best used sparingly.

I believe I can fill out any of those three directions into a full-fledged PR to the upstream(s) in question here (IceNet & testchipip, or riscv-isa-sim). Do any of them especially call to you, @jerryz123 ? I see that you've done a lot of this work, so I suspect you'd have a better sense for the overall context (pun intended).

jerryz123 commented 1 month ago

SimNetwork (& SimBlockDevice) are mis-using context_t by stashing the pointer from context_t::current in a field during construction. That's always a pointer to a thread-local, and if the constructor is called in a different thread (as here), that gets very difficult to reason about.

While I agree with your reasoning here, I don't think its reasonable to expect/require SimDevice implementations to be thread-safe, where the constructor/tick functions can be called from distinct threads. As far as I can tell, this quirk only appears with Verilator multi-threading. The other simulators don't do this, even with multithreading enabled.

context_t is challenging to use correctly, especially in a threaded program. An idea for how to improve the surface a bit would be to promote prev to a thread-local itself and add something like:

Does this generalize to systems with multiple contexts? IMO its better to require the programmer to explicitly specify the next context to execute. There are use-cases of context_t which have multiple contexts (not just target/host).

The direction suggested here would be to refactor the tick functions to return after a single step

The htif/tsi mechanism uses context_t, but I believe the implementation is buried with the static FESVR library, which is compiled as part of spike (Spike uses htif and context_t as well in its own simulation loop). The FireSim FPGA emulation driver also heavily uses context_t.

Another example is the tick function for SpikeTile, which allows the Spike C++ core model to interact with the Chipyard RTL simulation. https://github.com/ucb-bar/chipyard/blob/eb6910aae00bcbad9b8f09fe40d0bc419fe42cbf/generators/chipyard/src/main/resources/csrc/spiketile.cc#L180 I couldn't think of a way to make that system work without context_t.

My belief here is that the init/tick functions being called from separate threads is a Verilator-specific quirk that we should work around with minimal disruption to existing other code/interfaces. Perhaps the simplest thing is to merge network_tick and network_init, and make the tick function initialize the devices on-demand?

Thank you for digging into this, I've learned quite a bit about the subtleties of the context_t behavior in a multi-threaded system from your analysis.

sethp commented 1 month ago

While I agree with your reasoning here, I don't think its reasonable to expect/require SimDevice implementations to be thread-safe, where the constructor/tick functions can be called from distinct threads.

Yeah, I hear you about not wanting the {runtime,complexity} overhead of generalized thread-safety in the simulated devices. I want to note that the situation here calls for a much narrower "kind" of thread safety—it's the difference between what Rust calls Sync (you might be called from multiple threads at the same time) and the much simpler Send (it's safe to move the resource between threads)[^1][^2]. There is a strict happens-before relationship between the verilog inital blocks that call init and the always blocks that call tick, which it appears that verilator correctly implements. So to achieve correctness here it's not that the sim needs to handle multiple concurrent callers, but more or less just needs to avoid stashing a reference to another thread's local storage.

[^1]: It's fairly Rust-jaron-rich, but the rust user forums have a good discussion about what being !Send + Sync means.
[^2]: The only other non-experiential citation I have for this is the Rustonomicon, which unhelpfully states "A type is Send if it is safe to send it to another thread," and seems to mistake its own premise further down (filed as https://github.com/rust-lang/nomicon/issues/453 , for anyone reading that desires homework from a footnote).

As far as I can tell, this quirk only appears with Verilator multi-threading. The other simulators don't do this, even with multithreading enabled.

To be fully transparent, I have only a few dozen hours' worth of experience with any of the commercial verilog implementations, and none to the depth I've gotten here with verilator. I do wonder if it's on accident or by design that the other simulators don't encounter this behavior: do you know if there's some verilog standard (implicit or explicit) that verilator is violating here by evaluating the initial block in a different thread than the always block? If it should be treating the module, say, as the "unit" of work (perhaps iff the module contains DPI?), that's something it might be worth raising with them upstream, too.

Does this generalize to systems with multiple contexts? IMO its better to require the programmer to explicitly specify the next context to execute. There are use-cases of context_t which have multiple contexts (not just target/host).

The yeild semantics I implemented do generalize in the sense that every context_t has a most-recently-swapped antecedent, but as written would probably produce surprising behavior when trying to "nest" contexts (M switch_to A switch_to B followed by yield would "return" to B, but a second yield from A would pass control flow back to B). We could imagine a context "stack", with switch_to pushing a task, and a definition of yield that acts as a "pop." I believe that would work to implement arbitrarily nested contexts just fine, although there's other simple solutions too when the number of tasks is small and statically fixed, as I think is the case here[^3].

[^3]: If you're curious, I have an example that I'm playing with: https://github.com/sethp/ucontext-coroutine . I haven't gotten to threading just yet, and I suspect my implementation is broken even for tail-recursive / single-recursion cases, but it's been enlightening.

My belief here is that the init/tick functions being called from separate threads is a Verilator-specific quirk that we should work around with minimal disruption to existing other code/interfaces.

I appreciate the examples! I'm glad to have the benefit of your experience here. I'd agree at this point that "avoid use of context_t entirely" is a path not worth further exploration.

Unfortunately, I'm not sure there's a general resolution that doesn't at least involve at least looking at the other use sites: neither threading nor non-local control flow are famous for composing well. I haven't identified any answers that reside entirely within context_t or the verilated main or some other high-leverage point that would span all devices (at least, not yet).

Perhaps the simplest thing is to merge network_tick and network_init, and make the tick function initialize the devices on-demand?

It's a good idea, that's how it seems the spike tile (and, perhaps, htif?) gets away with using context_t when verilated as a multi-threaded model. I considered it, but I didn't bring it up, because it has some immediate consequences that I presumed would be disqualifying (all the _tick interfaces would have to take all the _init parameters, for example).

I'm also not entirely sure how durable it would be, as a solution: the verilator documentation on task scheduling suggests that they tried both static and dynamic scheduling and went for the static for performance (rather than correctness) reasons. I suspect a dynamically scheduled runtime (e.g. one based on work-stealing) would probably cause even spiketile.cc to spontaneously fail, as it got moved around between threads.

My guess is that's a decision that's unlikely to be reversed any time soon ("efficient dynamic scheduling" notwithstanding), and you're in the much better position than I to know if "all DPI-C that uses context_t has a single verilog-facing entrypoint" is an invariant you feel is more maintainable.

I think my plan at this point is to continue experimenting with ucontexts to get a better understanding of what it means to nest them (& how else they're used by htif & friends), and whether there's somewhere besides a thread-local to pass task-local sideband data to try and break the coupling there.

Pursuing a definition of a task that didn't care what thread it was scheduled on (as long as it wasn't scheduled more than once) seems like it offers a resolution that's relatively low-impact and high-durability to me.

Thank you for digging into this, I've learned quite a bit about the subtleties of the context_t behavior in a multi-threaded system from your analysis.

Thank you for reading, and for the feedback! I'm glad you've found it helpful, I've very much enjoyed learning about all these fine details as well :smile:

sethp commented 1 month ago

I think I finally understand the problem here well enough that I feel confident about what's happening. Much of the clarity came when I got curious about the question "why is assigning target = context_t::current() not sending us back to the initial block when we target->switch_to()?". I wrote a small little sample program to investigate^sample, but the short version is that the referent of current() is not stable, it's (sometimes) internally mutated as a side effect of calling ::switch_to().

I found that even in single threaded mode, target->switch_to() took me back to a surprising point—it only "resumes" the target simulation when the implicit second parameter to swapcontext (via the cur thread local) points to the same referent as we captured during initialization.

I understand the desire to move responsibility for that to the verilog implementation, but I don't see how to do so effectively. In this case there's three non-lexical scopes that all need to line up (the thread local, the init, and the tick), but since context_t::current() could point to any user-allocated structure (not just the anonymous thread-local one), and be captured behind any DPI-C call, any boundary we draw here feels somewhat arbitrary to me. And, "do all of the simulators agree about what is the atomic unit of thread-binding, and does that cover every deferred reference to context_t::current() is a much harder property to pattern match on than "does this switch_to call immediately update something with context_t::current() just prior?"

So, I'd like to pitch a three step plan:

  1. Repair the network device by updating target = context_t::current() just before calling host.switch_to() in netdev (& similar for the netsw), as mentioned above. This ensures the invariant that target always points to the task we're about to park, and therefore works across time & threads both.
  2. Identify some other usages of context_t and evaluate a similar repair. A quick grep suggests there's on the order of a half dozen or so usages of context_t in chipyard, so repair should only take a few days' of effort and can be incremental—IMO it's ok for this step to be best-effort, because one way identification works is "someone reports an issue about a segfault".
  3. Potentially, revisit the idea to invent a new semantic to be more explicit about the referent we want to update as a side effect of the switch_to—how often this pattern appears suggests to me that it is indeed something worth looking into, and context_t::current() may even be worth deprecating, since capturing its result is misuse-prone.

What do you think? Would you be willing to accept a change like 1 and piecemeal updates for 2?