Closed swimdi closed 3 days ago
I'm now trying to write a simpler pattern test that can reproduce this error
The issue here looks like its related to ttnn.Tensor(<buffer is not allocated>
It might due to the first operand of ttnn.add
is a host tensor instead of the device tensor (while the second operand is a device tensor)
I found the model IR contains the following chain:
ttnn_from_device_3 = ttnn_decorators_ttnn_from_device(ttnn_reshape_3)
ttnn_to_layout_24 = ttnn_decorators_ttnn_to_layout(ttnn_from_device_3, ttnn_ROW_MAJOR_LAYOUT)
ttnn_reshape_8 = ttnn_decorators_ttnn_reshape(ttnn_to_layout_24, (12, 768))
...
ttnn_add_8 = ttnn_decorators_ttnn_add(ttnn_to_layout_24, clone_3)
ttnn_reshape_3
is moved to a host tensor because of the current constraint of ttnn.reshape
. It is done by the logic in add_data_move_pass.py
:
Then I think the ttnn_reshape_3
is mapped to the host tensor by the code below. Therefore the following ops also reference to the host tensor, including the ttnn.add
in this issue
There seems to be the function try_add_layout_change_after_node
to move the tensor back to device, so I haven't fully understood why this issue happens
We run into the similar issue on bloom model:
ttnn_from_torch_4 = ttnn_decorators_ttnn_from_torch(arg295_1, layout = ttnn_ROW_MAJOR_LAYOUT, dtype = ttnn_bfloat16, device = ttnn_Specified_Device)
ttnn_from_device = ttnn_decorators_ttnn_from_device(ttnn_from_torch_4); ttnn_from_torch_4 = None
ttnn_to_layout_1 = ttnn_decorators_ttnn_to_layout(ttnn_from_device, ttnn_ROW_MAJOR_LAYOUT); ttnn_from_device = None
ttnn_reshape = ttnn_decorators_ttnn_reshape(ttnn_to_layout_1, (1, 32, 1, 1))
ttnn_to_layout_2 = ttnn_decorators_ttnn_to_layout(ttnn_reshape, ttnn_TILE_LAYOUT); ttnn_reshape = None
ttnn_to_device = ttnn_decorators_ttnn_to_device(ttnn_to_layout_2, device = ttnn_Specified_Device); ttnn_to_layout_2 = None
ttnn_moreh_cumsum = ttnn_decorators_ttnn_moreh_cumsum(ttnn_to_device, 1); ttnn_to_device = None
ttnn_from_device_1 = ttnn_decorators_ttnn_from_device(ttnn_moreh_cumsum); ttnn_moreh_cumsum = None
ttnn_to_layout_3 = ttnn_decorators_ttnn_to_layout(ttnn_from_device_1, ttnn_ROW_MAJOR_LAYOUT); ttnn_from_device_1 = None
ttnn_reshape_1 = ttnn_decorators_ttnn_reshape(ttnn_to_layout_3, (1, 32)); ttnn_to_layout_3 = None
ttnn_to_layout_4 = ttnn_decorators_ttnn_to_layout(ttnn_reshape_1, ttnn_TILE_LAYOUT); ttnn_reshape_1 = None
ttnn_to_device_1 = ttnn_decorators_ttnn_to_device(ttnn_to_layout_4, device = ttnn_Specified_Device); ttnn_to_layout_4 = None
ttnn_subtract = ttnn_decorators_ttnn_subtract(ttnn_to_device_1, 1); ttnn_to_device_1 = None
> ttnn_multiply = ttnn_decorators_ttnn_multiply(ttnn_subtract, ttnn_to_layout_1); ttnn_subtract = ttnn_to_layout_1 = None
<eval_with_key>.20:28:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = FastOperation(python_fully_qualified_name='ttnn.multiply', function=<ttnn._ttnn.operations.binary.multiply_t object at...<function default_postprocess_golden_function_outputs at 0x7f92f9310940>, is_cpp_operation=True, is_experimental=False)
function_args = (<[RuntimeError("TT_FATAL @ /tmp/build-via-sdist-n2_feo7u/metal_libs-0.53.0rc28+wormhole.b0/tt_metal/impl/device/devic...00000, 0.00000, ..., 1.00000, 1.00000]], shape=Shape([1, 32]), dtype=DataType::BFLOAT16, layout=Layout::ROW_MAJOR))
function_kwargs = {}
def __call__(self, *function_args, **function_kwargs):
> return self.function(*function_args, **function_kwargs)
E RuntimeError: TT_THROW @ /tmp/build-via-sdist-n2_feo7u/metal_libs-0.53.0rc28+wormhole.b0/ttnn/cpp/ttnn/tensor/tensor.hpp:250: tt::exception
E info:
E Cannot get the device from a tensor with host storage
E backtrace:
resolved in #392
When I debug
albert-base-v2-classification
model test, has this error messageIt failed at
ttnn.add
and its input shape isI'm not sure why the second shape has
12[32]
valueAnd if I block the lowering of this
ttnn.add
op back toaten.add.Tensor
, then this test passThe reproduce step is
to_tt_guard.py
pytest tests/models/albert/test_albert_token_classification.py
After this issue is resolved, please also remove the related blocklist in
to_tt_guard.py
.