rdaly525 / coreir

BSD 3-Clause "New" or "Revised" License
101 stars 24 forks source link

segfault when using prototype components. #981

Open mbstrange2 opened 3 years ago

mbstrange2 commented 3 years ago

I am experiencing an issue where any attempts to generate a Verilog for my test bench is segfaulting.

%> coreir --version
v0.1.51

This error can be reproduced by checking out lake:sparse_strawman and garnet:spVspV and running python tests/test_memory_core/test_memory_core.py in garnet.

WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
WARNING:magma:Wiring multiple outputs to same wire, using last connection. Input: Interconnect.Tile_X02_Y01.clk, Old Output: Interconnect.Tile_X02_Y00.clk_out, New Output: Interconnect.Tile_X01_Y01.clk_pass_through_out_right
WARNING:magma:Wiring multiple outputs to same wire, using last connection. Input: Interconnect.Tile_X04_Y01.clk, Old Output: Interconnect.Tile_X04_Y00.clk_out, New Output: Interconnect.Tile_X03_Y01.clk_pass_through_out_right
WARNING:magma:Wiring multiple outputs to same wire, using last connection. Input: Interconnect.Tile_X06_Y01.clk, Old Output: Interconnect.Tile_X06_Y00.clk_out, New Output: Interconnect.Tile_X05_Y01.clk_pass_through_out_right
Segmentation fault (core dumped)

This is the output I see when trying to generate the verilog in this context.

rdaly525 commented 3 years ago

@mbstrange2, Ill take a look

leonardt commented 3 years ago

Can you try running pytest with the "-s" flag to see if there's a CoreIR error message being dumped?

mbstrange2 commented 3 years ago

@leonardt I haven't been using pytest, just normal python, so I believe whatever output should be there, right?

leonardt commented 3 years ago

This is the output I get when running the test:

~/repos/garnet spVspV*
garnet-venv ❯ PYTHONPATH=. python tests/test_memory_core/test_memory_core.py
/home/lenny/repos/garnet/garnet-venv/src/peak/peak/mapper/mapper.py:229: SyntaxWarning: "is" with a literal. Did you mean "=="?
  assert arch_binding[0][1] is ()
/home/lenny/repos/garnet/garnet-venv/src/peak/peak/mapper/mapper.py:236: SyntaxWarning: "is" with a literal. Did you mean "=="?
  assert ir_binding[0][1] is ()
/home/lenny/repos/garnet/garnet-venv/src/peak/peak/mapper/utils.py:198: SyntaxWarning: "is" with a literal. Did you mean "=="?
  assert binding[0][1] is ()
/home/lenny/repos/garnet/garnet-venv/src/peak/peak/mapper/utils.py:199: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if len(binding)==1 and binding[0][1] is ():
/home/lenny/repos/garnet/garnet-venv/src/peak/peak/mapper/utils.py:246: SyntaxWarning: "is" with a literal. Did you mean "=="?
  assert arch_path is ()
/home/lenny/repos/garnet/garnet-venv/src/lake/lake/passes/passes.py:29: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if port_name is "mode":
/home/lenny/repos/garnet/garnet-venv/src/lake/lake/utils/util.py:131: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if pdir is "input":
/home/lenny/repos/garnet/garnet-venv/src/lake/lake/utils/util.py:240: SyntaxWarning: "is" with a literal. Did you mean "=="?
  if pdir is "input":
Getting length on class SparseSequenceConstraints.ZERO
Getting length on class SparseSequenceConstraints.ZERO

NEW TEST
len1=0
len2=0
num_match=0
SEQA: []
SEQB: []
DATA0: []
DATAD0: []
DATA1: []
DATAD1: []
common coords: []
result data: []
ALIGNED LENGTH 0: 0
ALIGNED LENGTH 1: 0
ADATA0: []
ADATAD0: []
ADATA1: []
ADATAD1: []
Variable: back_empty has no sink
Variable: back_full has no sink
Variable: front_empty has no sink
Variable: front_full has no sink
Variable: rd_valid has no sink
--------------------------------------------------------------------------------
/home/lenny/repos/garnet/garnet-venv/src/lake/lake/modules/strg_RAM.py:104
         self._rd_bank = self.var("rd_bank", max(1, clog2(self.banks)))
         self.set_read_bank()
>        self._rd_valid = self.var("rd_valid", 1)
         self.set_read_valid()
         if self.fw_int == 1:
--------------------------------------------------------------------------------
Use anneal_param_factor 120
HPWL: 12.668244
HPWL: 10.684666
Using HPWL: 10.684666
Before annealing energy: 359.644200
After annealing energy: 4.487500 improvement: 0.98752293/3293 | 328.9 kHz | 0s<0s]
terminate called after throwing an instance of 'std::runtime_error'
  what():  error in assign clb cells got cell type j
Traceback (most recent call last):
  File "tests/test_memory_core/test_memory_core.py", line 1162, in <module>
    spVspV_regress(dump_dir="mek_dump",
  File "tests/test_memory_core/test_memory_core.py", line 1133, in spVspV_regress
    success = run_test(len1, len2, num_match, value_limit, dump_dir=dump_dir, log_name=log_name, trace=trace)
  File "tests/test_memory_core/test_memory_core.py", line 1069, in run_test
    out_coord, out_data = spVspV_test(trace=trace,
  File "tests/test_memory_core/test_memory_core.py", line 926, in spVspV_test
    placement, routing = pnr(interconnect, (netlist, bus), cwd=cwd)
  File "/home/lenny/repos/garnet/garnet-venv/src/archipelago/archipelago/pnr_.py", line 82, in pnr
    place(packed_file, layout_filename, placement_filename, has_fixed)
  File "/home/lenny/repos/garnet/garnet-venv/src/archipelago/archipelago/place.py", line 16, in place
    subprocess.check_call([placer_binary, layout_filename,
  File "/home/lenny/miniconda3/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/home/lenny/repos/garnet/garnet-venv/lib/python3.8/site-packages/placer', '/home/lenny/repos/garnet/mek_dump/design.layout', 'mek_dump/design.packed', 'mek_dump/design.place']' died with <Signals.SIGABRT: 6>.

Looks like some issue related to placement?

leonardt commented 3 years ago

Runnning python garnet.py -v works without error for me, so I suspect there's some differences in our setups. Are there any local changes to garnet/lake or other dependencies that might not have been pushed yet?

mbstrange2 commented 3 years ago

@leonardt Sorry about that, you need updated cyclone, thunder, and canal, then

export DISABLE_GP=1

leonardt commented 3 years ago

Ok, I had to manually install the latest master branch from the cgra_pnr repo. I'm able to run the test and get verilog generated

garnet-venv ❯ python tests/test_memory_core/test_memory_core.py
Getting length on class SparseSequenceConstraints.ZERO
Getting length on class SparseSequenceConstraints.ZERO

NEW TEST
len1=0
len2=0
num_match=0
SEQA: []
SEQB: []
DATA0: []
DATAD0: []
DATA1: []
DATAD1: []
common coords: []
result data: []
ALIGNED LENGTH 0: 0
ALIGNED LENGTH 1: 0
ADATA0: []
ADATAD0: []
ADATA1: []
ADATAD1: []
Variable: back_empty has no sink
Variable: back_full has no sink
Variable: rd_valid has no sink
--------------------------------------------------------------------------------
/home/lenny/repos/garnet/garnet-venv/src/lake/lake/modules/strg_RAM.py:104
         self._rd_bank = self.var("rd_bank", max(1, clog2(self.banks)))
         self.set_read_bank()
>        self._rd_valid = self.var("rd_valid", 1)
         self.set_read_valid()
         if self.fw_int == 1:
--------------------------------------------------------------------------------
Variable: front_empty has no sink
Variable: front_full has no sink
 90.000000 -> 81.000000 improvement: 0.100000 total: 0.000000 | 675.9 kHz | 0s<0s]
 81.000000 -> 81.000000 improvement: 0.000000 total: 0.100000 | 442.3 kHz | 0s<0s]
using bit_width 1
Routing iteration:   0 duration: 20 ms
using bit_width 16
Routing iteration:   0 duration: 6 ms
[(4, 16), (83, 134217728), (83, 33554432), (4, 2), (83, 16777216)]
[(3, -16), (2, 0), (2, -2), (1, 0), (0, 65536), (1, 65536), (0, 0), (3, 1048576)]
[(4, 16), (83, 134217728), (83, 33554432), (4, 2), (83, 16777216)]
[(3, -16), (2, 0), (2, -2), (1, 0), (0, 65536), (1, 65536), (0, 0), (3, 1048576)]
Config isect core.....!
[(0, 256)]
[(4, 16), (83, 134217728), (83, 33554432), (4, 2), (83, 16777216)]
[(4, 16), (83, 134217728), (83, 33554432), (4, 2), (83, 16777216)]
[(0, 64), (4, 1), (83, 16777216), (83, 134217728), (4, 16)]
[(0, 64), (4, 1), (83, 16777216), (83, 134217728), (4, 16)]
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
Generating LALR tables
WARNING: 183 shift/reduce conflicts
WARNING:magma:Wiring multiple outputs to same wire, using last connection. Input: Interconnect.Tile_X02_Y01.clk, Old Output: Interconnect.Tile_X02_Y00.clk_out, New Output: Interconnect.Tile_X01_Y01.clk_pass_through_out_right
WARNING:magma:Wiring multiple outputs to same wire, using last connection. Input: Interconnect.Tile_X04_Y01.clk, Old Output: Interconnect.Tile_X04_Y00.clk_out, New Output: Interconnect.Tile_X03_Y01.clk_pass_through_out_right
WARNING:magma:Wiring multiple outputs to same wire, using last connection. Input: Interconnect.Tile_X06_Y01.clk, Old Output: Interconnect.Tile_X06_Y00.clk_out, New Output: Interconnect.Tile_X05_Y01.clk_pass_through_out_right
mek_dump/Interconnect.json
Running command: verilator -Wall -Wno-INCABSPATH -Wno-DECLFILENAME -Wno-fatal --cc Interconnect.v -v cfg_and_dbg_unq1.sv -v tap_unq1.sv -v jtag.sv -v glc_axi_ctrl.sv -v flop_unq1.sv -v flop_unq3.sv -v flop_unq2.sv -v glc_jtag_ctrl.sv -v global_controller.sv -v glc_axi_addrmap.sv -v CW_fp_add.v -v CW_fp_mult.v -v AN2D0BWP16P90.sv -v AO22D0BWP16P90.sv --exe Interconnect_driver.cpp --top-module Interconnect

Perhaps there's some difference in our setup still.

Can you show the pycoreir version and check if there's multiple version of coreir in your path with

pip show pycoreir

and

which -a coreir

here's what I have

~/repos/garnet spVspV*
garnet-venv ❯ pip show coreir
Name: coreir
Version: 2.0.128
Summary: Python bindings for CoreIR
Home-page: https://github.com/leonardt/pycoreir
Author: Leonard Truong
Author-email: lenny@cs.stanford.edu
License: BSD License
Location: /home/lenny/repos/garnet/garnet-venv/lib/python3.8/site-packages
Requires: hwtypes
Required-by: CoSA, magma-lang, fault, peak, metamapper

~/repos/garnet spVspV*
garnet-venv ❯ which -a coreir
/home/lenny/repos/garnet/garnet-venv/bin/coreir
/home/lenny/miniconda3/bin/coreir
mbstrange2 commented 3 years ago

Can you make sure to target xcelium? I'm not sure if there's any difference if you choose a different simulator target.


(aha) root@615a6684288f:/aha/garnet# pip show coreir
Name: coreir
Version: 2.0.128
Summary: Python bindings for CoreIR
Home-page: https://github.com/leonardt/pycoreir
Author: Leonard Truong
Author-email: lenny@cs.stanford.edu
License: BSD License
Location: /aha/pycoreir
Requires: hwtypes
Required-by: CoSA, magma-lang, peak, fault
(aha) root@615a6684288f:/aha/garnet# which -a coreir
/usr/local/bin/coreir
leonardt commented 3 years ago

The verilator compilation failed with a huge amount of errors, here's a snippet:

      |       ^~~~~~~~~~~~~~~~
../Interconnect_driver.cpp:8143:7: error: ‘io2glb_1_X06_Y00’ was not declared in this scope
 8143 |   if (io2glb_1_X06_Y00) {
      |       ^~~~~~~~~~~~~~~~
../Interconnect_driver.cpp:8166:7: error: ‘io2glb_1_X01_Y00’ was not declared in this scope
 8166 |   if (io2glb_1_X01_Y00) {
      |       ^~~~~~~~~~~~~~~~
../Interconnect_driver.cpp:8169:7: error: ‘io2glb_1_X06_Y00’ was not declared in this scope
 8169 |   if (io2glb_1_X06_Y00) {
      |       ^~~~~~~~~~~~~~~~
../Interconnect_driver.cpp:8192:7: error: ‘io2glb_1_X01_Y00’ was not declared in this scope
 8192 |   if (io2glb_1_X01_Y00) {
      |       ^~~~~~~~~~~~~~~~
../Interconnect_driver.cpp:8195:7: error: ‘io2glb_1_X06_Y00’ was not declared in this scope
 8195 |   if (io2glb_1_X06_Y00) {
      |       ^~~~~~~~~~~~~~~~
../Interconnect_driver.cpp:8218:7: error: ‘io2glb_1_X01_Y00’ was not declared in this scope
 8218 |   if (io2glb_1_X01_Y00) {
      |       ^~~~~~~~~~~~~~~~
../Interconnect_driver.cpp:8221:7: error: ‘io2glb_1_X06_Y00’ was not declared in this scope
 8221 |   if (io2glb_1_X06_Y00) {
      |       ^~~~~~~~~~~~~~~~
../Interconnect_driver.cpp:8244:7: error: ‘io2glb_1_X01_Y00’ was not declared in this scope
 8244 |   if (io2glb_1_X01_Y00) {
      |       ^~~~~~~~~~~~~~~~
../Interconnect_driver.cpp:8247:7: error: ‘io2glb_1_X06_Y00’ was not declared in this scope
 8247 |   if (io2glb_1_X06_Y00) {
      |       ^~~~~~~~~~~~~~~~
../Interconnect_driver.cpp:8270:7: error: ‘io2glb_1_X01_Y00’ was not declared in this scope
 8270 |   if (io2glb_1_X01_Y00) {
      |       ^~~~~~~~~~~~~~~~
../Interconnect_driver.cpp:8273:7: error: ‘io2glb_1_X06_Y00’ was not declared in this scope
 8273 |   if (io2glb_1_X06_Y00) {
      |       ^~~~~~~~~~~~~~~~
../Interconnect_driver.cpp:8296:7: error: ‘io2glb_1_X01_Y00’ was not declared in this scope
 8296 |   if (io2glb_1_X01_Y00) {
      |       ^~~~~~~~~~~~~~~~
../Interconnect_driver.cpp:8299:7: error: ‘io2glb_1_X06_Y00’ was not declared in this scope
 8299 |   if (io2glb_1_X06_Y00) {

I wonder if the large amount of errors is causing a segfault in the downstream tool?

I'll try using xcelium to see if there's any difference

mbstrange2 commented 3 years ago

Can you check conftest.py in garnet and make sure to set skip_compile=False? I might have pushed the code with it true in which case there's no verilog being produced.

leonardt commented 3 years ago

skip_compile is False in conftest

I was looking at the test code and noticed:

1029         tester_if = tester._if(circuit.interface[cvalid])

I think it should be

1029         tester_if = tester._if(tester.peek(circuit.interface[cvalid]))

Since you need to use the tester.peek function when referring to a circuit port (when not using the tester.circuit interface)

mbstrange2 commented 3 years ago

There must be some other mismatch in our envs. This worked for me when using old generated verilog and ran fine in xcelium.

leonardt commented 3 years ago

Also, I don't think changing the simulator target (to xcelium) would affect the verilog code generation. If you can't generate code with python garnet.py -v (without using the test), then this suggests that there's still some difference in our setup since I can generate the verilog fine.

mbstrange2 commented 3 years ago

I can generate the verilog with python garnet.py -v, just trying to figure out why it fails for me and Keyi when we use the test.

leonardt commented 3 years ago

Ah, I see, I misread the original post then, let me investigate with the xcelium target then

leonardt commented 3 years ago

Changing the target doesn't seem to affect verilog code generation for me (I get a file in mek_dump, Interconnect.V), so I think there's still some difference in our environments

mbstrange2 commented 3 years ago

Okay this is somewhat great news then. The test ran and passed?

Here's my pip list


(aha) root@615a6684288f:/aha/garnet# pip list
Package             Version   Location
------------------- --------- ---------------------
aha                 0.0.0     /aha
archipelago         0.0.8     /aha/archipelago
ast-tools           0.0.30    /aha/ast_tools
astor               0.8.1
attrs               20.3.0
buffer-mapping      0.0.5     /aha/BufferMapping
canal               0.0.0     /aha/canal
certifi             2020.12.5
chardet             4.0.0
colorlog            4.7.2
coreir              2.0.128   /aha/pycoreir
CoSA                0.4       /aha/cosa
dataclasses         0.6
DeCiDa              1.1.5
decorator           4.4.2
docker              4.4.1
fault               3.0.47    /aha/fault
gemstone            0.0.0     /aha/gemstone
genesis2            0.0.5
gitdb               4.0.5
GitPython           3.1.12
gmpy2               2.0.8
graphviz            0.16
hwtypes             1.4.4     /aha/hwtypes
idna                2.10
importlib-metadata  3.4.0
iniconfig           1.1.1
Jinja2              2.11.2
jmapper             0.2.0
kratos              0.0.32.3  /aha/kratos
lake-aha            0.0.4     /aha/lake
lassen              0.0.1     /aha/lassen
libcst              0.3.16
magma-lang          2.1.27    /aha/magma
Mako                1.1.4
mantle              2.0.16    /aha/mantle
MarkupSafe          1.1.1
mflowgen            0.3.0     /aha/mflowgen
mypy-extensions     0.4.3
networkx            2.5
numpy               1.19.5
ordered-set         4.0.2
packaging           20.9
peak                0.0.1     /aha/peak
pip                 20.1.1
pluggy              0.13.1
ply                 3.11
py                  1.10.0
pycyclone           0.3.26    /aha/cgra_pnr/cyclone
pydot               1.4.1
pyparsing           2.4.7
PySMT               0.9.0
pysv                0.1.2
pytest              6.2.2
pythunder           0.3.26    /aha/cgra_pnr/thunder
pyverilog           1.3.0
PyYAML              5.4.1
requests            2.25.1
requirements-parser 0.2.0
scipy               1.6.0
setuptools          47.1.0
six                 1.15.0
smmap               3.0.5
staticfg            0.9.5
tabulate            0.8.7
toml                0.10.2
typing-extensions   3.7.4.3
typing-inspect      0.6.0
urllib3             1.26.3
websocket-client    0.57.0
wheel               0.36.2
z3-solver           4.8.10.0
zipp                3.4.0
leonardt commented 3 years ago

I'm able to generate verilog and the test runs xrun but then fails with some errors. Here are the relevant *E snippets from xrun.log

   660 xmvlog: *E,DUPIDN (Interconnect.v,5493|18): identifier 'exp_bits' previously declared [12.5(IEEE)].
   661 localparam frac_bits = 7;
   662                    |
   663 xmvlog: *E,DUPIDN (Interconnect.v,5494|19): identifier 'frac_bits' previously declared [12.5(IEEE)].
   664     module worklib.mul:v
   665         errors: 2, warnings: 0

  1765 xmvlog: *E,DUPIDN (global_buffer_int.sv,129|45): identifier 'glb_config_rd_data' previously declared [12.5(IEEE)].
  1766     module worklib.global_buffer_int:sv
  1767         errors: 1, warnings: 0
leonardt commented 3 years ago

But it does not segfault at any point

mbstrange2 commented 3 years ago

You're having it use cadence ware (CW)? Those errors are in the PE so I'm even more confused.

leonardt commented 3 years ago

I haven't changed anything, looking at the generated code though, it looks out of date so possibly some different coreir version is being used

leonardt commented 3 years ago

Ah yes, my version of python on kiwi is old (3.7) so it's installing an older version of coreir, going to upgrade it to 3.8

leonardt commented 3 years ago

Hmm that wasn't the problem, it actually seemed to be the right version of coreir and still getting the same output

rdaly525 commented 3 years ago

Thanks for looking into this, I can help later today if this is still not resolved.

mbstrange2 commented 3 years ago

Hmmm not sure what to do then.

leonardt commented 3 years ago

When looking at the generated mek_dump/Interconnect.v on my local machine, I'm getting a different output (without localparam error), so it seems that something on kiwi is causing me to generate different verilog.

leonardt commented 3 years ago

Ok, figured it out. There was a leftover old version of coreir in my LD_LIBRARY_PATH, you may want to check that out (maybe there's an old version of the library being used). This was causing the old float code library to be loaded and affecting the verilog output). Now I just get this error from the global buffer:

  1118 xmvlog: *E,DUPIDN (global_buffer_int.sv,129|45): identifier 'glb_config_rd_data' previously declared [12.5(IEEE)].
  1119     module worklib.global_buffer_int:sv

I'm going to try patching it locally to see if the test will run

leonardt commented 3 years ago

Okay, I resolved the global_buffer_int problem, it looks like the test bench was copying the entire contents of genesis_verif directory. It turns out that directory had some old genesis files from an older version of garnet that was being copied in and causing the error. Purging the directory resolved that issue (now I'm getting the xcelium license issue so I'm trying again with the older version that works)

leonardt commented 3 years ago

Ok so the simulation completes but then fails during the results parsing with:

xcelium> run 10000ns
COORD:     0, VAL:     x
COORD:     0, VAL:     x
COORD:     0, VAL:     x
COORD:     0, VAL:     x
COORD:     0, VAL:     x
COORD:     0, VAL:     x
COORD:     0, VAL:     x
COORD:     0, VAL:     x
COORD:     0, VAL:     x
COORD:     0, VAL:     x
COORD:     0, VAL:     x
COORD:     0, VAL:     x
COORD:     0, VAL:     x
Simulation complete via $finish(1) at time 3541 NS + 0
./Interconnect_tb.sv:3668         #20 $finish;
xcelium> assertion -summary -final
  Summary report deferred until the end of simulation.
xcelium> quit
  No assertions found.
xmsim: *N,PRASRT: Protected assertions are not shown.
TOOL:   xrun(64)    19.03-s003: Exiting on Feb 02, 2021 at 13:33:17 PST  (total: 00:00:24)
</STDOUT>
Traceback (most recent call last):
  File "tests/test_memory_core/test_memory_core.py", line 1162, in <module>
    spVspV_regress(dump_dir="mek_dump",
  File "tests/test_memory_core/test_memory_core.py", line 1133, in spVspV_regress
    success = run_test(len1, len2, num_match, value_limit, dump_dir=dump_dir, log_name=log_name, trace=trace)
  File "tests/test_memory_core/test_memory_core.py", line 1089, in run_test
    data_sim = [int(x[3]) for x in split_lines]
  File "tests/test_memory_core/test_memory_core.py", line 1089, in <listcomp>
    data_sim = [int(x[3]) for x in split_lines]
ValueError: invalid literal for int() with base 10: 'x'

But I think I'm much further then necessary. It looks like I'm able to generate the verilog and run the test totally fine without a segfault so let's see what's different about your environment. Can you post the output of your $PATH and $LD_LIBRARY_PATH? Let's make sure there's no old versions of coreir lying around there. Also, is your coreir version installed via pip? Or do you have a local installation from a checkout of the pycoreir repo?

leonardt commented 3 years ago

Ah, I see that you have coreir installed from a local location: coreir 2.0.128 /aha/pycoreir

Can we double check this setup by either recompiling it to ensure it's up to date or uninstalling this version and using the pip distribution?

mbstrange2 commented 3 years ago

This is in the aha docker - if you want to attach to it mstrange-gracious_visvesvaraya and check it out that might be easier? Or startup another docker?

leonardt commented 3 years ago

docker attach mstrange-gracious_visvesvaraya hangs for me, I wonder if only one person can be attached at a time? or if there's a user permissions issue

leonardt commented 3 years ago

Hmm wait, nevermind, hitting ctrl-c dropped me into the shell, maybe it was just waiting for a command

mbstrange2 commented 3 years ago

You just need to hit enter - it doesn't automatically show the prompt for some reason lol

leonardt commented 3 years ago

Hm the tests seem to be running for me, it seems to be running more than one though so I have finished all of them

leonardt commented 3 years ago

Have you tried simply reattaching to the container? Perhaps there's some leftover config in your env causing the problem? How many tests is this supposed to run? I'm still waiting for it to finish but it seems to be running xcelium multiple times so it doesn't seem to be having any problems generating the verilog.

mbstrange2 commented 3 years ago

Oh I'm sorry one second I have skip_compile=True in there

mbstrange2 commented 3 years ago

Okay if you run it again in the docker it will segfault

leonardt commented 3 years ago

Seemed to have "worked around" the issue by uninstalling coreir, and installing the pypi distribution. So something about the local docker setup is likely at fault

cd /aha/coreir/build
make uninstall
pip uninstall coreir
pip install coreir
leonardt commented 3 years ago

I reinstalled coreir and it causes the segfault so something about the local build is causing the problem

leonardt commented 3 years ago

Hmm, I tried reverting coreir to an older commit to match up with the pycoreir release (which is a few commits behind coreir master) but still the same problem, which suggests it's not an issue with any of the recent changes (also reviewing the commits shows nothing that would suggest a seg fault, they are minor)

leonardt commented 3 years ago

@mbstrange2 does that workaround work for unblocking you for now? We'll need to investigate the docker environment more closely to see what would be causing this issue with the local build versus using the pip wheel distribution

rdaly525 commented 3 years ago

Where is the docker environment specified?

mbstrange2 commented 3 years ago

@leonardt This workaround is good for me at present

@rdaly525 https://hub.docker.com/r/stanfordaha/garnet it's this docker - it should be created from https://github.com/StanfordAHA/aha