tenstorrent / tt-buda

Tenstorrent TT-BUDA Repository
Other
162 stars 21 forks source link

Failed smoke test on two Wormhole N300s with Alpha Release v0.17.0-alpha. #32

Open maychair opened 3 weeks ago

maychair commented 3 weeks ago

I tried running the smoke test on two Wormhole N300s and encountered an issue. Here is log:

2024-06-25 14:33:19.900 | INFO     | Runtime         - running: '/home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/budabackend/umd/device/bin/silicon/x86/create-ethernet-map /home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/budabackend//cluster_desc.yaml' with timeout 120s
  Detecting chips (found 4)                                                                                                                                                                                                                                                                                          
2024-06-25 14:33:21.975 | INFO     | Backend         - initialize_child_process called on pid 47954
2024-06-25 14:33:24.523 | DEBUG    | pybuda.tvm_to_python:_determine_node_dtype:1705 - Node 'weights1' does not have a framework dtype specified. Using TVM generated dtype.
2024-06-25 14:33:24.523 | DEBUG    | pybuda.tvm_to_python:_determine_node_dtype:1705 - Node 'weights2' does not have a framework dtype specified. Using TVM generated dtype.
2024-06-25 14:33:24.533 | DEBUG    | pybuda.ttdevice:_create_input_queue_device_connector:1408 - Creating input queue connector on TTDevice 'auto_tt0'
2024-06-25 14:33:24.533 | DEBUG    | pybuda.ttdevice:_create_intermediates_queue_device_connector:1418 - Creating fwd intermediates queue connector on TTDevice 'auto_tt0'
2024-06-25 14:33:24.533 | DEBUG    | pybuda.ttdevice:_create_forward_output_queue_device_connector:1398 - Creating forward output queue connector on TTDevice 'auto_tt0'
2024-06-25 14:33:28.454 | INFO     | Runtime         - running: '/home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/budabackend/umd/device/bin/silicon/x86/create-ethernet-map /home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/budabackend//cluster_desc.yaml' with timeout 120s
  Detecting chips (found 4)                                                                                                                                                                                                                                                                                          
2024-06-25 14:33:30.499 | INFO     | pybuda.device_connector:pusher_thread_main:147 - Pusher thread on <pybuda.device_connector.InputQueueDirectPusherDeviceConnector object at 0x7fe41ce8bee0> starting
2024-06-25 14:33:30.500 | INFO     | Backend         - initialize_child_process called on pid 48210
2024-06-25 14:33:30.502 | DEBUG    | pybuda.device:run_next_command:455 - Received COMPILE command on TTDevice 'auto_tt0' / 48210
2024-06-25 14:33:30.503 | DEBUG    | pybuda.ttdevice:compile_for:785 - Compiling for Inference mode on TTDevice 'auto_tt0'
2024-06-25 14:33:30.558 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:chips_with_mmio
2024-06-25 14:33:30.589 | INFO     | Runtime         - Found cluster descriptor file at path=/tmp/chao.mei/bd28e0ccb38e/cluster_desc.yaml
2024-06-25 14:33:30.594 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:chip_locations
2024-06-25 14:33:30.597 | ERROR    | pybuda.device:run_next_command:469 - Compile error: TT_ASSERT @ pybuda/csrc/backend_api/device_config.hpp:138: chip_coord_to_chip_id.find(coord) == chip_coord_to_chip_id.end()
backtrace:
 --- /home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/_C.cpython-38-x86_64-linux-gnu.so(+0x8d44a8) [0x7fe3285b54a8]
 --- /home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/_C.cpython-38-x86_64-linux-gnu.so(+0x8d4dfc) [0x7fe3285b5dfc]

My environment is Ubuntu 20, Alpha Release v0.17.0-alpha. And cluster_desc.yaml file is :

arch: {
   0: Wormhole,
   1: Wormhole,
   2: Wormhole,
   3: Wormhole,
}

chips: {
   0: [0,0,0,0],
   1: [0,0,0,0],
   2: [1,0,0,0],
   3: [1,0,0,0],
}

ethernet_connections: [
   [{chip: 0, chan: 8}, {chip: 2, chan: 0}],
   [{chip: 0, chan: 9}, {chip: 2, chan: 1}],
   [{chip: 1, chan: 8}, {chip: 3, chan: 0}],
   [{chip: 1, chan: 9}, {chip: 3, chan: 1}],
]

chips_with_mmio: [
   0: 2,
   1: 3,
]

# harvest_mask is the bit indicating which tensix row is harvested. So bit 0 = first tensix row; bit 1 = second tensix row etc...
harvesting: [
   0: {noc_translation: true, harvest_mask: 576},
   1: {noc_translation: true, harvest_mask: 257},
   2: {noc_translation: true, harvest_mask: 10},
   3: {noc_translation: true, harvest_mask: 5},
]

How should i fix it?

staylorTT commented 3 weeks ago

Thanks for submitting the issue, can you also please include the command you were using to run the smoke test?

maychair commented 3 weeks ago

Sure, here is complete log:

chao.mei@leo:~/pybuda_wh$ source env/bin/activate
(env) chao.mei@leo:~/pybuda_wh$ python smoke.py 
2024-06-26 10:17:35.879 | INFO     | Runtime         - running: '/home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/budabackend/umd/device/bin/silicon/x86/create-ethernet-map /home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/budabackend//cluster_desc.yaml' with timeout 120s
  Detecting chips (found 4)                                                                                                                          
2024-06-26 10:17:36.233 | WARNING  | SiliconDriver   - NumHostMemChannels: 3 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-06-26 10:17:36.242 | WARNING  | SiliconDriver   - NumHostMemChannels: 3 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-06-26 10:17:36.275 | WARNING  | SiliconDriver   - NumHostMemChannels: 3 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-06-26 10:17:36.280 | WARNING  | SiliconDriver   - NumHostMemChannels: 3 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-06-26 10:17:36.310 | WARNING  | SiliconDriver   - NumHostMemChannels: 3 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-06-26 10:17:36.315 | WARNING  | SiliconDriver   - NumHostMemChannels: 3 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-06-26 10:17:36.345 | WARNING  | SiliconDriver   - NumHostMemChannels: 3 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-06-26 10:17:36.350 | WARNING  | SiliconDriver   - NumHostMemChannels: 3 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-06-26 10:17:38.417 | INFO     | Backend         - initialize_child_process called on pid 13820
/home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/flax/struct.py:132: FutureWarning: jax.tree_util.register_keypaths is deprecated, and will be removed in a future release. Please use `register_pytree_with_keys()` instead.
  jax.tree_util.register_keypaths(data_clz, keypaths)
/home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/flax/struct.py:132: FutureWarning: jax.tree_util.register_keypaths is deprecated, and will be removed in a future release. Please use `register_pytree_with_keys()` instead.
  jax.tree_util.register_keypaths(data_clz, keypaths)
2024-06-26 10:17:41.449 | DEBUG    | pybuda.tvm_to_python:_determine_node_dtype:1705 - Node 'weights1' does not have a framework dtype specified. Using TVM generated dtype.
2024-06-26 10:17:41.449 | DEBUG    | pybuda.tvm_to_python:_determine_node_dtype:1705 - Node 'weights2' does not have a framework dtype specified. Using TVM generated dtype.
2024-06-26 10:17:41.461 | DEBUG    | pybuda.ttdevice:_create_input_queue_device_connector:1408 - Creating input queue connector on TTDevice 'auto_tt0'
2024-06-26 10:17:41.461 | DEBUG    | pybuda.ttdevice:_create_intermediates_queue_device_connector:1418 - Creating fwd intermediates queue connector on TTDevice 'auto_tt0'
2024-06-26 10:17:41.461 | DEBUG    | pybuda.ttdevice:_create_forward_output_queue_device_connector:1398 - Creating forward output queue connector on TTDevice 'auto_tt0'
2024-06-26 10:17:45.437 | INFO     | Runtime         - running: '/home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/budabackend/umd/device/bin/silicon/x86/create-ethernet-map /home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/budabackend//cluster_desc.yaml' with timeout 120s
  Detecting chips (found 4)                                                                                                                          
2024-06-26 10:17:45.751 | WARNING  | SiliconDriver   - NumHostMemChannels: 3 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-06-26 10:17:45.757 | WARNING  | SiliconDriver   - NumHostMemChannels: 3 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-06-26 10:17:45.787 | WARNING  | SiliconDriver   - NumHostMemChannels: 3 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-06-26 10:17:45.793 | WARNING  | SiliconDriver   - NumHostMemChannels: 3 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-06-26 10:17:45.822 | WARNING  | SiliconDriver   - NumHostMemChannels: 3 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-06-26 10:17:45.827 | WARNING  | SiliconDriver   - NumHostMemChannels: 3 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-06-26 10:17:45.856 | WARNING  | SiliconDriver   - NumHostMemChannels: 3 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-06-26 10:17:45.862 | WARNING  | SiliconDriver   - NumHostMemChannels: 3 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-06-26 10:17:47.469 | INFO     | pybuda.device_connector:pusher_thread_main:147 - Pusher thread on <pybuda.device_connector.InputQueueDirectPusherDeviceConnector object at 0x7f8b06e1cdf0> starting
2024-06-26 10:17:47.470 | INFO     | Backend         - initialize_child_process called on pid 14075
2024-06-26 10:17:47.472 | DEBUG    | pybuda.device:run_next_command:455 - Received COMPILE command on TTDevice 'auto_tt0' / 14075
2024-06-26 10:17:47.472 | DEBUG    | pybuda.ttdevice:compile_for:785 - Compiling for Inference mode on TTDevice 'auto_tt0'
2024-06-26 10:17:47.528 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:chips_with_mmio
2024-06-26 10:17:47.540 | WARNING  | SiliconDriver   - NumHostMemChannels: 3 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-06-26 10:17:47.545 | WARNING  | SiliconDriver   - NumHostMemChannels: 3 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-06-26 10:17:47.559 | INFO     | Runtime         - Found cluster descriptor file at path=/tmp/chao.mei/bd28e0ccb38e/cluster_desc.yaml
2024-06-26 10:17:47.564 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:chip_locations
2024-06-26 10:17:47.567 | ERROR    | pybuda.device:run_next_command:469 - Compile error: TT_ASSERT @ pybuda/csrc/backend_api/device_config.hpp:138: chip_coord_to_chip_id.find(coord) == chip_coord_to_chip_id.end()
backtrace:
 --- /home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/_C.cpython-38-x86_64-linux-gnu.so(+0x8d44a8) [0x7f8a125624a8]
 --- /home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/_C.cpython-38-x86_64-linux-gnu.so(+0x8d4dfc) [0x7f8a12562dfc]
 --- /home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/_C.cpython-38-x86_64-linux-gnu.so(+0x2c5206) [0x7f8a11f53206]
 --- /home/chao.mei/pybuda_wh/env/bin/python(PyCFunction_Call+0x59) [0x5d5499]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyObject_MakeTpCall+0x296) [0x5d6066]
 --- /home/chao.mei/pybuda_wh/env/bin/python() [0x4e2288]
 --- /home/chao.mei/pybuda_wh/env/bin/python(PyObject_Call+0x62) [0x5d4c12]
 --- /home/chao.mei/pybuda_wh/env/bin/python() [0x579e24]
 --- /home/chao.mei/pybuda_wh/env/bin/python() [0x5847f7]
 --- /home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/torch/lib/libtorch_python.so(+0x386675) [0x7f8ba5968675]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyObject_MakeTpCall+0x296) [0x5d6066]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x5f18) [0x54ca58]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x1876) [0x5483b6]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x1b6) [0x5d5846]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x1b6) [0x5d5846]
 --- /home/chao.mei/pybuda_wh/env/bin/python() [0x4e1b5c]
 --- /home/chao.mei/pybuda_wh/env/bin/python(PyObject_Call+0x62) [0x5d4c12]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x1f26) [0x548a66]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x1b6) [0x5d5846]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x1b6) [0x5d5846]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x725) [0x547265]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x1876) [0x5483b6]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- /home/chao.mei/pybuda_wh/env/bin/python(PyEval_EvalCode+0x27) [0x684327]
 --- /home/chao.mei/pybuda_wh/env/bin/python() [0x673a41]
 --- /home/chao.mei/pybuda_wh/env/bin/python() [0x673abb]
 --- /home/chao.mei/pybuda_wh/env/bin/python(PyRun_StringFlags+0x7f) [0x673c0f]
 --- /home/chao.mei/pybuda_wh/env/bin/python(PyRun_SimpleStringFlags+0x3f) [0x67460f]
 --- /home/chao.mei/pybuda_wh/env/bin/python(Py_RunMain+0x2cc) [0x6b412c]
 --- /home/chao.mei/pybuda_wh/env/bin/python(Py_BytesMain+0x2d) [0x6b43fd]
 --- /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f8ba844b083]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_start+0x2e) [0x5da67e]

Traceback (most recent call last):
  File "/home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/device.py", line 458, in run_next_command
    ret = self.compile_for(
  File "/home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/ttdevice.py", line 808, in compile_for
    device_cfg=self.get_device_config(compiler_cfg),
  File "/home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/ttdevice.py", line 224, in get_device_config
    dev_cfg = get_device_config(self.arch,
  File "/home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/ttdevice.py", line 1616, in get_device_config
    return DeviceConfig(
RuntimeError: TT_ASSERT @ pybuda/csrc/backend_api/device_config.hpp:138: chip_coord_to_chip_id.find(coord) == chip_coord_to_chip_id.end()
backtrace:
 --- /home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/_C.cpython-38-x86_64-linux-gnu.so(+0x8d44a8) [0x7f8a125624a8]
 --- /home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/_C.cpython-38-x86_64-linux-gnu.so(+0x8d4dfc) [0x7f8a12562dfc]
 --- /home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/_C.cpython-38-x86_64-linux-gnu.so(+0x2c5206) [0x7f8a11f53206]
 --- /home/chao.mei/pybuda_wh/env/bin/python(PyCFunction_Call+0x59) [0x5d5499]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyObject_MakeTpCall+0x296) [0x5d6066]
 --- /home/chao.mei/pybuda_wh/env/bin/python() [0x4e2288]
 --- /home/chao.mei/pybuda_wh/env/bin/python(PyObject_Call+0x62) [0x5d4c12]
 --- /home/chao.mei/pybuda_wh/env/bin/python() [0x579e24]
 --- /home/chao.mei/pybuda_wh/env/bin/python() [0x5847f7]
 --- /home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/torch/lib/libtorch_python.so(+0x386675) [0x7f8ba5968675]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyObject_MakeTpCall+0x296) [0x5d6066]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x5f18) [0x54ca58]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x1876) [0x5483b6]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x1b6) [0x5d5846]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x1b6) [0x5d5846]
 --- /home/chao.mei/pybuda_wh/env/bin/python() [0x4e1b5c]
 --- /home/chao.mei/pybuda_wh/env/bin/python(PyObject_Call+0x62) [0x5d4c12]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x1f26) [0x548a66]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x1b6) [0x5d5846]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x1b6) [0x5d5846]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x725) [0x547265]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x1876) [0x5483b6]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- /home/chao.mei/pybuda_wh/env/bin/python(PyEval_EvalCode+0x27) [0x684327]
 --- /home/chao.mei/pybuda_wh/env/bin/python() [0x673a41]
 --- /home/chao.mei/pybuda_wh/env/bin/python() [0x673abb]
 --- /home/chao.mei/pybuda_wh/env/bin/python(PyRun_StringFlags+0x7f) [0x673c0f]
 --- /home/chao.mei/pybuda_wh/env/bin/python(PyRun_SimpleStringFlags+0x3f) [0x67460f]
 --- /home/chao.mei/pybuda_wh/env/bin/python(Py_RunMain+0x2cc) [0x6b412c]
 --- /home/chao.mei/pybuda_wh/env/bin/python(Py_BytesMain+0x2d) [0x6b43fd]
 --- /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f8ba844b083]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_start+0x2e) [0x5da67e]

Traceback (most recent call last):
  File "smoke.py", line 25, in <module>
    test_module_direct_pytorch()
  File "smoke.py", line 20, in test_module_direct_pytorch
    output = pybuda.PyTorchModule("direct_pt", PyTorchTestModule()).run(input1, input2)
  File "/home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/module.py", line 95, in run
    output_q = pybuda.run_inference(self, inputs=[args])
  File "/home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/run/api.py", line 90, in run_inference
    return _run_inference(module, inputs, input_count, output_queue, _sequential, _perf_trace, _verify_cfg)
  File "/home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/run/impl.py", line 277, in _run_inference
    return _run_devices_inference(
  File "/home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/run/impl.py", line 467, in _run_devices_inference
    output_queue = _initialize_pipeline(False, output_queue, sequential=sequential, verify_cfg=verify_cfg)
  File "/home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/run/impl.py", line 414, in _initialize_pipeline
    _compile_devices(sequential, training=training, sample_inputs=sample_inputs, sample_targets=sample_targets, microbatch_count=microbatch_count, verify_cfg=verify_cfg)
  File "/home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/run/impl.py", line 1248, in _compile_devices
    raise ret
RuntimeError: TT_ASSERT @ pybuda/csrc/backend_api/device_config.hpp:138: chip_coord_to_chip_id.find(coord) == chip_coord_to_chip_id.end()
backtrace:
 --- /home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/_C.cpython-38-x86_64-linux-gnu.so(+0x8d44a8) [0x7f8a125624a8]
 --- /home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/_C.cpython-38-x86_64-linux-gnu.so(+0x8d4dfc) [0x7f8a12562dfc]
 --- /home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/pybuda/_C.cpython-38-x86_64-linux-gnu.so(+0x2c5206) [0x7f8a11f53206]
 --- /home/chao.mei/pybuda_wh/env/bin/python(PyCFunction_Call+0x59) [0x5d5499]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyObject_MakeTpCall+0x296) [0x5d6066]
 --- /home/chao.mei/pybuda_wh/env/bin/python() [0x4e2288]
 --- /home/chao.mei/pybuda_wh/env/bin/python(PyObject_Call+0x62) [0x5d4c12]
 --- /home/chao.mei/pybuda_wh/env/bin/python() [0x579e24]
 --- /home/chao.mei/pybuda_wh/env/bin/python() [0x5847f7]
 --- /home/chao.mei/pybuda_wh/env/lib/python3.8/site-packages/torch/lib/libtorch_python.so(+0x386675) [0x7f8ba5968675]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyObject_MakeTpCall+0x296) [0x5d6066]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x5f18) [0x54ca58]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x1876) [0x5483b6]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x1b6) [0x5d5846]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x1b6) [0x5d5846]
 --- /home/chao.mei/pybuda_wh/env/bin/python() [0x4e1b5c]
 --- /home/chao.mei/pybuda_wh/env/bin/python(PyObject_Call+0x62) [0x5d4c12]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x1f26) [0x548a66]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x1b6) [0x5d5846]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x907) [0x547447]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x1b6) [0x5d5846]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x725) [0x547265]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyFunction_Vectorcall+0x393) [0x5d5a23]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalFrameDefault+0x1876) [0x5483b6]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_PyEval_EvalCodeWithName+0x26a) [0x54552a]
 --- /home/chao.mei/pybuda_wh/env/bin/python(PyEval_EvalCode+0x27) [0x684327]
 --- /home/chao.mei/pybuda_wh/env/bin/python() [0x673a41]
 --- /home/chao.mei/pybuda_wh/env/bin/python() [0x673abb]
 --- /home/chao.mei/pybuda_wh/env/bin/python(PyRun_StringFlags+0x7f) [0x673c0f]
 --- /home/chao.mei/pybuda_wh/env/bin/python(PyRun_SimpleStringFlags+0x3f) [0x67460f]
 --- /home/chao.mei/pybuda_wh/env/bin/python(Py_RunMain+0x2cc) [0x6b412c]
 --- /home/chao.mei/pybuda_wh/env/bin/python(Py_BytesMain+0x2d) [0x6b43fd]
 --- /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f8ba844b083]
 --- /home/chao.mei/pybuda_wh/env/bin/python(_start+0x2e) [0x5da67e]

2024-06-26 10:17:47.571 | DEBUG    | pybuda.run.impl:_shutdown:1265 - PyBuda shutdown

And smoke.py

(env) chao.mei@leo:~/pybuda_wh$ cat smoke.py 
import pybuda
import torch

# Sample PyTorch module
class PyTorchTestModule(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.weights1 = torch.nn.Parameter(torch.rand(32, 32), requires_grad=True)
        self.weights2 = torch.nn.Parameter(torch.rand(32, 32), requires_grad=True)
    def forward(self, act1, act2):
        m1 = torch.matmul(act1, self.weights1)
        m2 = torch.matmul(act2, self.weights2)
        return m1 + m2, m1

def test_module_direct_pytorch():
    input1 = torch.rand(4, 32, 32)
    input2 = torch.rand(4, 32, 32)
    # Run single inference pass on a PyTorch module, using a wrapper to convert to PyBUDA first
    output = pybuda.PyTorchModule("direct_pt", PyTorchTestModule()).run(input1, input2)
    print(output)

if __name__ == "__main__":
    test_module_direct_pytorch()

Let me know if you need anything else.

maychair commented 3 weeks ago

Tested 'smoke.py' with one Wormhole n300, and it works fine.

staylorTT commented 3 weeks ago

We are working on a multi-card release; Buda does not currently natively support multiple cards. I will update this issue once it is ready for testing. Thank you.