tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
478 stars 78 forks source link

[Bug Report] ttnn.matmul fails with `Error: incompatible values 1 and 3` on N150 #15360

Open jerrysky3 opened 8 hours ago

jerrysky3 commented 8 hours ago

Describe the bug On N150, call ttnn.matmul with LHS shape (784, 192) and RHS shape (192, 576) results in the error below. It doesn't happen on N300

Traceback (most recent call last):
  File "exp_matmul.py", line 12, in <module>
    main(device)
  File "exp_matmul.py", line 7, in main
    print(ttnn.matmul(lhs, rhs))
  File "/home/jerry/tt-metal/ttnn/ttnn/decorators.py", line 329, in __call__
    return self.function(*function_args, **function_kwargs)
RuntimeError: TT_FATAL @ ../ttnn/cpp/ttnn/operations/matmul/device/matmul_op.cpp:1252: program_config.out_block_h % program_config.out_subblock_h == 0
info:
Error: incompatible values 1 and 3
backtrace:
 --- /home/jerry/tt-metal/ttnn/ttnn/_ttnn.so(+0x4ef669) [0x7fe22725f669]
 --- /home/jerry/tt-metal/ttnn/ttnn/_ttnn.so(+0xf2f4b2) [0x7fe227c9f4b2]
 --- ttnn::operations::matmul::Matmul::validate(std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor const>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor const>>> const&) const
 --- void ttnn::device_operation::detail::launch_on_worker_thread<tt::tt_metal::operation::OldInfraDeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>, unsigned char, long, tt::tt_metal::operation::DeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>, tt::tt_metal::operation::OldInfraDeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>::tensor_args_t, std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>, tt::tt_metal::v0::Device*>(unsigned char, long, tt::tt_metal::operation::DeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>> const&, tt::tt_metal::operation::OldInfraDeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>::tensor_args_t const&, std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>&, tt::tt_metal::v0::Device*&)
 --- tt::tt_metal::operation::OldInfraDeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>::tensor_return_value_t ttnn::device_operation::detail::launch_on_single_device<tt::tt_metal::operation::OldInfraDeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>>(unsigned char, tt::tt_metal::operation::OldInfraDeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>::operation_attributes_t const&, tt::tt_metal::operation::OldInfraDeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>::tensor_args_t const&)
 --- /home/jerry/tt-metal/ttnn/ttnn/_ttnn.so(+0x41b90a) [0x7fe22718b90a]
 --- tt::tt_metal::operation::OldInfraDeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>::tensor_return_value_t ttnn::device_operation::detail::invoke<tt::tt_metal::operation::OldInfraDeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>>(unsigned char, tt::tt_metal::operation::OldInfraDeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>::operation_attributes_t const&, tt::tt_metal::operation::OldInfraDeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>::tensor_args_t const&)
 --- /home/jerry/tt-metal/ttnn/ttnn/_ttnn.so(+0x41a55b) [0x7fe22718a55b]
 --- /home/jerry/tt-metal/ttnn/ttnn/_ttnn.so(+0x416698) [0x7fe227186698]
 --- std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> tt::tt_metal::operation::run<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>(tt::tt_metal::operation::DeviceOperation<std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>>>&&, std::__1::vector<tt::tt_metal::Tensor, std::__1::allocator<tt::tt_metal::Tensor>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor const>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor const>>> const&, std::__1::vector<std::__1::optional<tt::tt_metal::Tensor>, std::__1::allocator<std::__1::optional<tt::tt_metal::Tensor>>> const&, unsigned char)
 --- /home/jerry/tt-metal/ttnn/ttnn/_ttnn.so(+0xf33361) [0x7fe227ca3361]
 --- /home/jerry/tt-metal/ttnn/ttnn/_ttnn.so(+0xf34085) [0x7fe227ca4085]
 --- ttnn::operations::matmul::matmul(tt::tt_metal::Tensor const&, tt::tt_metal::Tensor const&, std::__1::optional<tt::tt_metal::Tensor const>, ttnn::operations::matmul::Matmul const&, unsigned char)

To Reproduce

import torch
import ttnn

def main(device):
    lhs = ttnn.from_torch(torch.rand((784, 192), dtype=torch.bfloat16), device=device, layout=ttnn.TILE_LAYOUT)
    rhs = ttnn.from_torch(torch.rand((192, 576), dtype=torch.bfloat16), device=device, layout=ttnn.TILE_LAYOUT)
    print(ttnn.matmul(lhs, rhs))

if __name__ == "__main__":
    try:
        device = ttnn.open_device(device_id=0)
        main(device)
    finally:
        ttnn.close_device(device)

Please complete the following environment information: