tenstorrent / tt-buda

Tenstorrent TT-BUDA Repository
205 stars 27 forks source link

Timeout Error Depending On Tensor Sizes #56

Open jhlee508 opened 1 week ago

jhlee508 commented 1 week ago

This single linear layer test code causes an error as below. It works fine if the input_tensor shape is [1, 32] or [1, 32, 32]. However, when the input is [32, 32] it gets a timeout error when reading output (linear.output_add_2). Could you help me find the solution?

Test code

import pybuda
import torch
import time

class Linear(torch.nn.Module):
    def __init__(self):
        super(Linear, self).__init__()
        self.linear = torch.nn.Linear(32, 32)

    def forward(self, x):
        x = self.linear(x)
        return x

if __name__ == '__main__':
    # Create a TT device
    tt0 = pybuda.TTDevice("tt0", num_chips=1)

    # Create a PyTorch module with PyBuda Wrapper
    tt0.place_module(pybuda.PyTorchModule("linear", Linear()))

    # Create an input tensor
    input_tensor = torch.randn(32, 32)

    # # Compile and run inference
    start = time.time()
    output_queue = pybuda.run_inference(inputs=[input_tensor])
    outputs = output_queue.get()
    print(">>> Inference time: ", time.time() - start)

Error Log

2024-09-12 17:12:28.179 | INFO     | Runtime         - running: '/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/budabackend/umd/device/bin/silicon/x86/create-ethernet-map /home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/budabackend//cluster_desc.yaml' with timeout 120s
  Detecting chips (found 2)                                                                                                                                                                                                                                                                 
2024-09-12 17:12:28.346 | WARNING  | SiliconDriver   - NumHostMemChannels: 2 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-09-12 17:12:28.380 | WARNING  | SiliconDriver   - NumHostMemChannels: 2 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-09-12 17:12:31.103 | INFO     | Backend         - initialize_child_process called on pid 6405
/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/flax/struct.py:132: FutureWarning: jax.tree_util.register_keypaths is deprecated, and will be removed in a future release. Please use `register_pytree_with_keys()` instead.
  jax.tree_util.register_keypaths(data_clz, keypaths)
/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/flax/struct.py:132: FutureWarning: jax.tree_util.register_keypaths is deprecated, and will be removed in a future release. Please use `register_pytree_with_keys()` instead.
  jax.tree_util.register_keypaths(data_clz, keypaths)
2024-09-12 17:12:34.873 | DEBUG    | pybuda.tvm_to_python:_determine_node_dtype:1713 - Node 'linear.weight' does not have a framework dtype specified. Using TVM generated dtype.
2024-09-12 17:12:34.873 | DEBUG    | pybuda.tvm_to_python:_determine_node_dtype:1713 - Node 'linear.bias' does not have a framework dtype specified. Using TVM generated dtype.
2024-09-12 17:12:34.916 | DEBUG    | pybuda.ttdevice:_create_input_queue_device_connector:1408 - Creating input queue connector on TTDevice 'tt0'
2024-09-12 17:12:34.916 | DEBUG    | pybuda.ttdevice:_create_intermediates_queue_device_connector:1418 - Creating fwd intermediates queue connector on TTDevice 'tt0'
2024-09-12 17:12:34.916 | DEBUG    | pybuda.ttdevice:_create_forward_output_queue_device_connector:1398 - Creating forward output queue connector on TTDevice 'tt0'
2024-09-12 17:12:39.053 | INFO     | Runtime         - running: '/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/budabackend/umd/device/bin/silicon/x86/create-ethernet-map /home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/budabackend//cluster_desc.yaml' with timeout 120s
  Detecting chips (found 2)                                                                                                                                                                                                                                                                 
2024-09-12 17:12:39.192 | WARNING  | SiliconDriver   - NumHostMemChannels: 2 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-09-12 17:12:39.227 | WARNING  | SiliconDriver   - NumHostMemChannels: 2 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-09-12 17:12:40.648 | INFO     | pybuda.device_connector:pusher_thread_main:148 - Pusher thread on <pybuda.device_connector.InputQueueDirectPusherDeviceConnector object at 0x7f1e3cfce070> starting
2024-09-12 17:12:40.649 | INFO     | Backend         - initialize_child_process called on pid 6618
2024-09-12 17:12:40.650 | DEBUG    | pybuda.device:run_next_command:455 - Received COMPILE command on TTDevice 'tt0' / 6618
2024-09-12 17:12:40.650 | DEBUG    | pybuda.ttdevice:compile_for:785 - Compiling for Inference mode on TTDevice 'tt0'
2024-09-12 17:12:40.710 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:chips_with_mmio
2024-09-12 17:12:40.725 | WARNING  | SiliconDriver   - NumHostMemChannels: 2 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-09-12 17:12:40.741 | INFO     | Runtime         - Found cluster descriptor file at path=/tmp/jaehwan/3ab2f8d6c3b9/cluster_desc.yaml
2024-09-12 17:12:40.743 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:chip_locations
2024-09-12 17:12:40.743 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:ethernet_connections
2024-09-12 17:12:40.743 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:chips_with_mmio
2024-09-12 17:12:40.743 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:chip_locations
2024-09-12 17:12:40.743 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:ethernet_connections
2024-09-12 17:12:40.743 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage init_compile
2024-09-12 17:12:40.747 | INFO     | pybuda.ci:initialize_output_build_directory:98 - Pybuda output build directory for compiled artifacts: /tmp/jaehwan/3ab2f8d6c3b9
2024-09-12 17:12:40.758 | INFO     | pybuda.ci:create_symlink:89 - Symlink created from /home/n4/jaehwan/research/tenstorrent/buda-tests/torch-module/tt_build/test_out to /tmp/jaehwan/3ab2f8d6c3b9
2024-09-12 17:12:40.794 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:chips_with_mmio
2024-09-12 17:12:40.794 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:chip_locations
2024-09-12 17:12:40.794 | INFO     | Backend         - Lookup contexts -- arch:system scope:device name:ethernet_connections
2024-09-12 17:12:40.794 | INFO     | pybuda.compile:init_compile:511 - Device architecutre: wormhole_b0
2024-09-12 17:12:40.794 | INFO     | pybuda.compile:init_compile:512 - Device grid size: r = 8, c = 8
2024-09-12 17:12:40.794 | INFO     | pybuda.compile:init_compile:522 - Using chips: [0]
2024-09-12 17:12:40.794 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage generate_initial_graph
2024-09-12 17:12:40.816 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage post_initial_graph_pass
2024-09-12 17:12:40.872 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage consteval_graph
2024-09-12 17:12:40.905 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage optimized_graph
2024-09-12 17:12:40.975 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage post_autograd_pass
2024-09-12 17:12:40.993 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage pre_lowering_pass
2024-09-12 17:12:41.011 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage buda_graph_pre_placer
2024-09-12 17:12:41.015 | INFO     | GraphCompiler   - Running with Automatic Mixed Precision Level = 0.
2024-09-12 17:12:41.033 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage balancer_pass
2024-09-12 17:12:41.033 | INFO     | Always          - Running Balancer with Policy: PolicyType::NLP
2024-09-12 17:12:41.052 | INFO     | Balancer        - Based on NLP matmul analysis, target cycle count is set to 45000
2024-09-12 17:12:41.052 | INFO     | Balancer        - Balancing 100% completed!
2024-09-12 17:12:41.053 | INFO     | Balancer        - Balancer perf score : 2314814.8
2024-09-12 17:12:41.053 | INFO     | Backend         - Lookup contexts -- arch:system scope:device0 name:harvesting_mask
2024-09-12 17:12:41.053 | INFO     | Placer          - Running DRAM allocator for device 0
2024-09-12 17:12:41.061 | INFO     | PerfModel       - Running performance model...
2024-09-12 17:12:41.079 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage pre_netlist_pass
2024-09-12 17:12:41.097 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage generate_netlist
2024-09-12 17:12:41.097 | INFO     | pybuda.compile:generate_netlist:1075 - Generating Netlist
2024-09-12 17:12:41.165 | INFO     | pybuda.ci:create_symlink:89 - Symlink created from /home/n4/jaehwan/research/tenstorrent/buda-tests/torch-module/linear_netlist.yaml to /tmp/jaehwan/3ab2f8d6c3b9/linear_netlist.yaml
2024-09-12 17:12:41.206 | INFO     | pybuda.compile:pybuda_compile_from_context:239 - Running compile stage backend_golden_verify
2024-09-12 17:12:41.207 | DEBUG    | pybuda.tensor:consteval_tensor:1233 - ConstEval graph: linear.weight
2024-09-12 17:12:41.208 | INFO     | Runtime         - Running tt_runtime on host: 'c1'
2024-09-12 17:12:41.208 | INFO     | PerfInfra       - Backend profiler is disabled
2024-09-12 17:12:41.208 | INFO     | PerfInfra       - Memory profiler is enabled
2024-09-12 17:12:41.212 | WARNING  | Runtime         - Config.soc_descriptor_path='/tmp/jaehwan/3ab2f8d6c3b9/device_descs/wormhole_b0_2064_0x0.yaml' doesn't exist, defaulting to '/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/budabackend/device/wormhole_b0_8x10.yaml'
2024-09-12 17:12:41.231 | INFO     | SiliconDriver   - Detected 1 PCI device : {0}
2024-09-12 17:12:41.233 | WARNING  | SiliconDriver   - NumHostMemChannels: 2 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-09-12 17:12:41.370 | INFO     | Runtime         - Compiling Firmware for TT device
2024-09-12 17:12:42.118 | INFO     | SiliconDriver   - Software version 6.0.0, Ethernet FW version 6.9.0 (Device 0)
2024-09-12 17:12:42.230 | INFO     | Runtime         - Starting device status monitor with TIMEOUT=500s
2024-09-12 17:12:42.230 | INFO     | Loader          - Waiting for 30 seconds for NCRISC Firmware to start running on 1 device(s)
2024-09-12 17:12:42.243 | INFO     | pybuda.backend:feeder_thread_main:149 - Feeder thread on <pybuda.backend.BackendAPI object at 0x7f1e3cfce1f0> starting
2024-09-12 17:12:42.243 | DEBUG    | pybuda.backend:push_constants_and_parameters:491 - Pushing to parameter linear.weight
2024-09-12 17:12:42.244 | DEBUG    | pybuda.backend:push_constants_and_parameters:491 - Pushing to parameter linear.bias
2024-09-12 17:12:42.273 | INFO     | SiliconDriver   - Detected 1 PCI device : {0}
2024-09-12 17:12:42.274 | WARNING  | SiliconDriver   - NumHostMemChannels: 2 used for device_id: 0x401e less than target: 4. Workload will fail if it exceeds NumHostMemChannels. Increase Number of Hugepages.
2024-09-12 17:12:42.302 | DEBUG    | pybuda.run.impl:_run_forward:644 - Running concurrent device forward: TTDevice 'tt0'
2024-09-12 17:12:42.304 | DEBUG    | pybuda.device:run_next_command:429 - Received RUN_FORWARD command on TTDevice 'tt0' / 6618
2024-09-12 17:12:42.305 | DEBUG    | pybuda.ttdevice:forward:906 - Starting forward on TTDevice 'tt0'
2024-09-12 17:12:42.305 | DEBUG    | pybuda.backend:feeder_thread_main:171 - Run feeder thread cmd: fwd
2024-09-12 17:12:42.306 | INFO     | Runtime         - Running program 'run_fwd_0' with params [("$p_loop_count", "1")]
2024-09-12 17:12:42.307 | DEBUG    | pybuda.backend:read_queues:345 - Reading output queue linear.output_add_2
2024-09-12 17:12:42.308 | DEBUG    | pybuda.device_connector:pusher_thread_main:163 - Pusher thread pushing tensors
2024-09-12 17:12:42.309 | DEBUG    | pybuda.backend:push_to_queues:452 - Pushing to queue input
2024-09-12 17:12:43.309 | DEBUG    | pybuda.backend:read_queues:362 - 0 Reading output queue linear.output_add_2 timed out after 1
2024-09-12 17:12:44.310 | DEBUG    | pybuda.backend:read_queues:362 - 1 Reading output queue linear.output_add_2 timed out after 1
2024-09-12 17:12:45.311 | DEBUG    | pybuda.backend:read_queues:362 - 2 Reading output queue linear.output_add_2 timed out after 1
2024-09-12 17:12:46.312 | DEBUG    | pybuda.backend:read_queues:362 - 3 Reading output queue linear.output_add_2 timed out after 1
2024-09-12 17:12:47.313 | DEBUG    | pybuda.backend:read_queues:362 - 4 Reading output queue linear.output_add_2 timed out after 1
2024-09-12 17:12:48.315 | DEBUG    | pybuda.backend:read_queues:362 - 5 Reading output queue linear.output_add_2 timed out after 1
2024-09-12 17:12:49.316 | DEBUG    | pybuda.backend:read_queues:362 - 6 Reading output queue linear.output_add_2 timed out after 1
2024-09-12 17:12:50.317 | DEBUG    | pybuda.backend:read_queues:362 - 7 Reading output queue linear.output_add_2 timed out after 1
2024-09-12 17:12:51.318 | DEBUG    | pybuda.backend:read_queues:362 - 8 Reading output queue linear.output_add_2 timed out after 1
2024-09-12 17:12:52.319 | DEBUG    | pybuda.backend:read_queues:362 - 9 Reading output queue linear.output_add_2 timed out after 1
Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/pybuda/device.py", line 577, in dc_transfer_thread
  File "/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/pybuda/device.py", line 591, in dc_transfer
  File "/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/pybuda/device_connector.py", line 441, in transfer
2024-09-12 17:12:52.321 | DEBUG    | pybuda.device:dc_transfer_thread:581 - Ending dc transfer thread intermediates on TTDevice 'tt0' due to shutdown event
    data = self.read()
  File "/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/pybuda/device_connector.py", line 348, in read
    ret = BackendAPI.read_queues(self.direct_pop_queues, self.original_shapes, self.runtime_tensor_transforms, requires_grad=self.requires_grad, single_output=False, shutdown_event=self.shutdown_event, clone=False)
  File "/home/n4/jaehwan/research/tenstorrent/pybuda/pybuda-env/lib/python3.8/site-packages/pybuda/backend.py", line 369, in read_queues
2024-09-12 17:12:52.322 | DEBUG    | pybuda.device:dc_transfer_thread:581 - Ending dc transfer thread forward_input on TTDevice 'tt0' due to shutdown event
    raise RuntimeError("Timeout while reading " + outq.name)
RuntimeError: Timeout while reading linear.output_add_2
2024-09-12 17:12:52.324 | DEBUG    | pybuda.device_connector:pusher_thread_main:156 - Ending pusher thread on <pybuda.device_connector.InputQueueDirectPusherDeviceConnector object at 0x7f1e3cfce070> due to shutdown event
2024-09-12 17:12:53.319 | DEBUG    | pybuda.device:get_next_command:360 - Ending process on TTDevice 'tt0' due to shutdown event
milank94 commented 1 week ago

Based on this line RuntimeError: Timeout while reading linear.output_add_2 it looks like the device is in a bad state.

Can you do a card reset using the tt-smi tool? or just doing a system reboot. Then try again.

jhlee508 commented 1 week ago

Both tt-smi -r 0 and sudo reboot didn't work. I don't understand why the input_tensor = torch.randn(32, 32) causes an error. However, it works fine when it's input_tensor = torch.randn(1, 32, 32). Should I just use the latter as a workaround? (Even though this problem should be fixed.)

milank94 commented 1 week ago

pybuda expects a batch dimension, that's why including the extra (1, 32, 32) is required.