tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
471 stars 73 forks source link

[Bug Report] ttnn.full input shape causes Segfault with TILE_LAYOUT #15030

Open kevinwuTT opened 14 hours ago

kevinwuTT commented 14 hours ago

Describe the bug

Calling ttnn.full with this shape (1, 50257) with TILE_LAYOUT on device gives a segfault. (This shape is used in GPTNeo.)

To Reproduce

import ttnn

with ttnn.manage_device(device_id=0) as device:
    ttnn_full = ttnn.full((1, 50257), layout = ttnn.TILE_LAYOUT, fill_value = 1.0, device = device)
Segmentation fault (core dumped)

However, if I have ttnn.full default to ROW_MAJOR_LAYOUT and call ttnn.to_layout to TILE_LAYOUT after, I do not get any errors.

import ttnn

with ttnn.manage_device(device_id=0) as device:
    ttnn_full = ttnn.full((1, 50257), fill_value = 1.0, device = device)
    to_layout = ttnn.to_layout(ttnn_full, layout = ttnn.TILE_LAYOUT)

Expected behavior ttnn.full alone should support this shape (1, 50257) with TILE_LAYOUT since splitting into two ops work.

Please complete the following environment information:


With a previous version, v0.53.0-rc35 there is no segfault, but this error:

RuntimeError: TT_FATAL @ /tmp/build-via-sdist-x_q5c5zg/metal_libs-0.53.0rc35+grayskull/ttnn/cpp/ttnn/operations/numpy/functions.hpp:66: shape[-1] % tt::constants::TILE_WIDTH == 0
info:
TILE layout requires width dimension to be multiple of 32

Might be related to this: https://github.com/tenstorrent/tt-metal/issues/14871

ayerofieiev-tt commented 12 hours ago

Setting to p0. We need help to get this resolved asap. We picked up latest metal wheel and ci got broken. Thank you!

KalaivaniMCW commented 3 hours ago
  1. ttnn.full uses host operation and required shape is given directly, so padding is not handled.

  2. For TILE layout, HW should be multiples of 32 as mentioned in the doc Image

  3. The TT_FATAL was removed recently as part of #14611 hence the segmentation fault instead of runtime error Image