nickg / nvc

VHDL compiler and simulator
https://www.nickg.me.uk/nvc/
GNU General Public License v3.0
636 stars 80 forks source link

crash when dumping waves #1030

Closed nathanaelhuffman closed 3 weeks ago

nathanaelhuffman commented 1 month ago

This sim seems to run fine if I don't dump the fst, but when I do I get this crash. I'm using --dump-arrays if it matters.

nvc: ../src/rt/wave.c:493: fst_get_ptr: Assertion `l->nparts == type_fields(rtype)' failed.
*** Caught signal 6 (SIGABRT) ***

[0x605cd2eb8f24] ../src/util.c:872 signal_handler
[0x796ed064251f] (/usr/lib/x86_64-linux-gnu/libc.so.6)
[0x796ed06969fc] (/usr/lib/x86_64-linux-gnu/libc.so.6) ./nptl/pthread_kill.c:44 __pthread_kill_implementation
[0x796ed06969fc] (/usr/lib/x86_64-linux-gnu/libc.so.6) ./nptl/pthread_kill.c:78 __pthread_kill_internal
[0x796ed06969fc] (/usr/lib/x86_64-linux-gnu/libc.so.6) ./nptl/pthread_kill.c:89 pthread_kill@@GLIBC_2.34
[0x796ed0642475] (/usr/lib/x86_64-linux-gnu/libc.so.6) ../sysdeps/posix/raise.c:26 raise
[0x796ed06287f2] (/usr/lib/x86_64-linux-gnu/libc.so.6) ./stdlib/abort.c:79 abort
[0x796ed062871a] (/usr/lib/x86_64-linux-gnu/libc.so.6) ./assert/assert.c:92 __assert_fail_base.cold
[0x796ed0639e95] (/usr/lib/x86_64-linux-gnu/libc.so.6) ./assert/assert.c:101 __assert_fail
[0x605cd2fc54ef] ../src/rt/wave.c:493 fst_get_ptr
[0x605cd2fc571e] ../src/rt/wave.c:528 fst_get_array_range
[0x605cd2fc6921] ../src/rt/wave.c:801 fst_alias_var
[0x605cd2fc6af3] ../src/rt/wave.c:830 fst_process_signal
[0x605cd2fc71ca] ../src/rt/wave.c:934 fst_walk_design
[0x605cd2fc7313] ../src/rt/wave.c:952 fst_walk_design
[0x605cd2fc7313] ../src/rt/wave.c:952 fst_walk_design
[0x605cd2fc75c6] ../src/rt/wave.c:1003 wave_dumper_restart
[0x605cd2eb0899] ../src/nvc.c:868 run_cmd
[0x605cd2eb359f] ../src/nvc.c:2135 process_command
[0x605cd2eafa84] ../src/nvc.c:542 elaborate
[0x605cd2eb3585] ../src/nvc.c:2133 process_command
[0x605cd2eb3ac5] ../src/nvc.c:2293 main

and the backtrace from gdb:

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffe6e00640 (LWP 32620)]
[New Thread 0x7fffe6400640 (LWP 32621)]
[New Thread 0x7fffe5a00640 (LWP 32622)]
[New Thread 0x7fffe5000640 (LWP 32623)]
[New Thread 0x7fffdfe00640 (LWP 32624)]
[New Thread 0x7fffdf400640 (LWP 32625)]
[New Thread 0x7fffdea00640 (LWP 32626)]
nvc: ../src/rt/wave.c:493: fst_get_ptr: Assertion `l->nparts == type_fields(rtype)' failed.

Thread 1 "nvc" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737239955648) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737239955648) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737239955648) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737239955648, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff1042476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff10287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff102871b in __assert_fail_base (fmt=0x7ffff11dd130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=0x555555764b18 "l->nparts == type_fields(rtype)", file=0x5555557649a2 "../src/rt/wave.c", line=493, function=<optimized out>)
    at ./assert/assert.c:92
#6  0x00007ffff1039e96 in __GI___assert_fail (assertion=0x555555764b18 "l->nparts == type_fields(rtype)",
    file=0x5555557649a2 "../src/rt/wave.c", line=493, function=0x555555764e90 <__PRETTY_FUNCTION__.5> "fst_get_ptr") at ./assert/assert.c:101
#7  0x00005555556814f0 in fst_get_ptr (wd=0x5555591ce630, scope=0x555559f9f450, where=0x7fffe47821b0) at ../src/rt/wave.c:493
#8  0x000055555568171f in fst_get_array_range (wd=0x5555591ce630, type=0x7fffe7005280, scope=0x555559f9f450, where=0x7fffe47821b0, dim=0,
    left=0x7fffffffcd58, right=0x7fffffffcd60, dir=0x7fffffffcd54, length=0x7fffffffcd68) at ../src/rt/wave.c:528
#9  0x0000555555682922 in fst_alias_var (wd=0x5555591ce630, d=0x7fffe70052c0, s=0x7fffe7601b80, tb=0x55555965a270) at ../src/rt/wave.c:801
#10 0x0000555555682af4 in fst_process_signal (wd=0x5555591ce630, scope=0x555559656520, d=0x7fffe70052c0, type=0x7fffe7005280, tb=0x55555965a270)
    at ../src/rt/wave.c:830
#11 0x00005555556831cb in fst_walk_design (wd=0x5555591ce630, block=0x7fffe7004e60) at ../src/rt/wave.c:934
#12 0x0000555555683314 in fst_walk_design (wd=0x5555591ce630, block=0x7fffe7004670) at ../src/rt/wave.c:952
#13 0x0000555555683314 in fst_walk_design (wd=0x5555591ce630, block=0x7fffe70042b0) at ../src/rt/wave.c:952
#14 0x00005555556835c7 in wave_dumper_restart (wd=0x5555591ce630, m=0x555557a6bee0, jit=0x555555870f30) at ../src/rt/wave.c:1003
#15 0x000055555556c89a in run_cmd (argc=6, argv=0x7fffffffd488, state=0x7fffffffd2d0) at ../src/nvc.c:868
#16 0x000055555556f5a0 in process_command (argc=6, argv=0x7fffffffd488, state=0x7fffffffd2d0) at ../src/nvc.c:2135
#17 0x000055555556ba85 in elaborate (argc=6, argv=0x7fffffffd488, state=0x7fffffffd2d0) at ../src/nvc.c:542
#18 0x000055555556f586 in process_command (argc=11, argv=0x7fffffffd460, state=0x7fffffffd2d0) at ../src/nvc.c:2133
#19 0x000055555556fac6 in main (argc=11, argv=0x7fffffffd460) at ../src/nvc.c:2293
nathanaelhuffman commented 1 month ago

Note that this is a regression somewhere between head and af2dd6b2289e6f3114b405f441763fb34742b652 as I rolled back to this build which I knew was working, and the waves dump fine as I remembered.

nickg commented 1 month ago

Can you try frame 7 and then call fmt_loc(stdout, tree_loc(where)) while you're at the gdb prompt? That should print the location of the signal that causes the problem.

nathanaelhuffman commented 1 month ago
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7fffe6e00640 (LWP 254882)]
[New Thread 0x7fffe6400640 (LWP 254883)]
[New Thread 0x7fffe5a00640 (LWP 254884)]
[New Thread 0x7fffe5000640 (LWP 254885)]
[New Thread 0x7fffdfe00640 (LWP 254886)]
[New Thread 0x7fffdf400640 (LWP 254887)]
[New Thread 0x7fffdea00640 (LWP 254888)]
nvc: ../src/rt/wave.c:493: fst_get_ptr: Assertion `l->nparts == type_fields(rtype)' failed.

Thread 1 "nvc" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=140737239955648) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140737239955648) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140737239955648) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140737239955648, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007ffff1042476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007ffff10287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007ffff102871b in __assert_fail_base (fmt=0x7ffff11dd130 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
    assertion=0x555555764ad8 "l->nparts == type_fields(rtype)", file=0x555555764962 "../src/rt/wave.c", line=493, function=<optimized out>)
    at ./assert/assert.c:92
#6  0x00007ffff1039e96 in __GI___assert_fail (assertion=0x555555764ad8 "l->nparts == type_fields(rtype)", file=0x555555764962 "../src/rt/wave.c",
    line=493, function=0x555555764e50 <__PRETTY_FUNCTION__.5> "fst_get_ptr") at ./assert/assert.c:101
#7  0x000055555568151c in fst_get_ptr (wd=0x55555783b8d0, scope=0x55555c6a6c70, where=0x7fffe47821b0) at ../src/rt/wave.c:493
#8  0x000055555568174b in fst_get_array_range (wd=0x55555783b8d0, type=0x7fffe7005780, scope=0x55555c6a6c70, where=0x7fffe47821b0, dim=0,
    left=0x7fffffffc7a8, right=0x7fffffffc7b0, dir=0x7fffffffc7a4, length=0x7fffffffc7b8) at ../src/rt/wave.c:528
#9  0x000055555568294e in fst_alias_var (wd=0x55555783b8d0, d=0x7fffe70057c0, s=0x7fffe7401b80, tb=0x55555d379320) at ../src/rt/wave.c:801
#10 0x0000555555682b20 in fst_process_signal (wd=0x55555783b8d0, scope=0x555556e52ee0, d=0x7fffe70057c0, type=0x7fffe7005780, tb=0x55555d379320)
    at ../src/rt/wave.c:830
#11 0x00005555556831f7 in fst_walk_design (wd=0x55555783b8d0, block=0x7fffe7005360) at ../src/rt/wave.c:934
#12 0x0000555555683340 in fst_walk_design (wd=0x55555783b8d0, block=0x7fffe7004b70) at ../src/rt/wave.c:952
#13 0x0000555555683340 in fst_walk_design (wd=0x55555783b8d0, block=0x7fffe70047b0) at ../src/rt/wave.c:952
#14 0x00005555556835f3 in wave_dumper_restart (wd=0x55555783b8d0, m=0x55555dbca830, jit=0x5555558717e0) at ../src/rt/wave.c:1003
#15 0x000055555556c89a in run_cmd (argc=6, argv=0x7fffffffced0, state=0x7fffffffcd20) at ../src/nvc.c:868
#16 0x000055555556f5a0 in process_command (argc=6, argv=0x7fffffffced0, state=0x7fffffffcd20) at ../src/nvc.c:2135
#17 0x000055555556ba85 in elaborate (argc=6, argv=0x7fffffffced0, state=0x7fffffffcd20) at ../src/nvc.c:542
#18 0x000055555556f586 in process_command (argc=10, argv=0x7fffffffceb0, state=0x7fffffffcd20) at ../src/nvc.c:2133
#19 0x000055555556fac6 in main (argc=10, argv=0x7fffffffceb0) at ../src/nvc.c:2293
(gdb) frame 7
#7  0x000055555568151c in fst_get_ptr (wd=0x55555783b8d0, scope=0x55555c6a6c70, where=0x7fffe47821b0) at ../src/rt/wave.c:493
493              assert(l->nparts == type_fields(rtype));
(gdb) call fmt_loc(stdout, tree_loc(where))
    > /home/nhuffman/oxide/quartz/hdl/ip/vhd/axi_blocks/axilite_if_pkg.vhd:76
    |
 76 |       data : std_logic_vector(31 downto 0);
    |       ^^^^
(gdb)

You can find this code here https://github.com/oxidecomputer/quartz/tree/main/hdl/ip/vhd/axi_blocks though we've got a buck2 based build system which adds a layer of indirection for easily running this test, and as I mentioned above this doesn't happen with that earlier commit.

I'm travelling next week so responses may be slower. Should you wish to run this full test w/o the build system, the post-build vunit run.py is attached, though there are some vhdl packages generated from the RDL files which you may need to hack around. I can work get the generated files too with some more time, let me know if more is needed!

run.py.txt

nickg commented 1 month ago

I almost got it to run with buck2 run root//hdl/ip/vhd/espi:espi_tb but it fails trying to generate the RDL files you mentioned above:

Action failed: root//hdl/ip/vhd/espi:espi_regs_pkg (rdl)
Local command returned non-zero exit code 1
Reproduce locally: `env -- 'BUCK_SCRATCH_PATH=buck-out/v2/tmp/root/42ce8b255a2bb6a2/rdl' buck-out/v2/gen/root/904931f735 ...<omitted>... _regs_pkg.vhd buck-out/v2/gen/root/904931f735703749/hdl/ip/vhd/espi/__espi_regs_pkg__/espi_regs.html (run `buck2 log what-failed` to get the full command)`
stdout:
stderr:
Traceback (most recent call last):
  File "<string>", line 52, in <module>
  File "<string>", line 49, in __run
  File "/home/nick/src/quartz/buck-out/v2/gen/root/904931f735703749/tools/site_cobble/rdl_pkg/__rdl_cli__/rdl_cli#link-tree/__par__/bootstrap.py", line 69, in run_as_main
    runpy._run_module_as_main(main_module, alter_argv=False)
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/home/nick/src/quartz/buck-out/v2/gen/root/904931f735703749/tools/site_cobble/rdl_pkg/__rdl_cli__/rdl_cli#link-tree/tools/site_cobble/rdl_pkg/rdl_cli.py", line 12, in <module>
    from systemrdl import RDLCompiler, RDLCompileError, RDLWalker
ModuleNotFoundError: No module named 'systemrdl'

I installed systemrdl-compiler in a Python virtual environment but it seems it's not being picked up.

nickg commented 1 month ago

I haven't managed to find a simple case that reproduces this but I think I can guess what the bug might be. Can you try again with the latest master branch?

nathanaelhuffman commented 1 month ago

Thanks for this and for braving the buck2 env! Your changes to master do indeed resolve the issue and dumping waves works on this sim again!

I have buck2 configured to use the bootstrap python env which is system python so not totally surprising that it doesn't play nice with virtual envs. I might log an issue on our repo to investigate this further since while I'm ok modifying my build machine's system python, I can see why others might not like doing that.

If you do want to get a reproducer running: Put this run.py and the two generated files (renamed to .vhd) at the checkout root and have vunit available in your python env, you can run any of the tests with the --gui flag to reproduce with an older nvc version. I've patched the paths so they are relative to a quartz/ checkout and should just work. there is one testcase that is failing failing to pass which is how I discovered the wave dumping issue while debugging and I haven't yet pushed the fix to main. python3 run.py --gui should get you a crash per testcase I think.

run.py.txt

2 generated files: espi_regs_pkg.vhd.txt espi_spec_regs_pkg.vhd.txt

As always, thanks much!

nickg commented 3 weeks ago

Thanks, I managed to write a test case that reproduces the original issue.