tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
471 stars 74 forks source link

ttnn.concat produces output buffers with invalid page size #13602

Open jaykru-tt opened 1 month ago

jaykru-tt commented 1 month ago

This bug was found in the PyTorch 2.0 trace suite sweep for ttnn.concat.

Exception: TT_FATAL @ ../tt_metal/impl/buffers/buffer.cpp:38: valid_page_size
info:
For valid non-interleaved buffers page size 10 must equal buffer size 8. For interleaved-buffers page size should be divisible by buffer size
backtrace:
 --- /home/ubuntu/actions-runner/_work/tt-metal/tt-metal/ttnn/ttnn/_ttnn.so(+0x584159) [0x7f1881f4d159]
 --- tt::tt_metal::validate_buffer_size_and_page_size(unsigned long, unsigned long, tt::tt_metal::BufferType const&, tt::tt_metal::TensorMemoryLayout const&, std::__1::optional<tt::tt_metal::ShardSpecBuffer> const&)
 --- tt::tt_metal::Buffer::Buffer(tt::tt_metal::Device*, unsigned long, unsigned long, tt::tt_metal::BufferType, tt::tt_metal::TensorMemoryLayout, std::__1::optional<tt::tt_metal::ShardSpecBuffer> const&, std::__1::optional<bool>, bool)
 --- /home/ubuntu/actions-runner/_work/tt-metal/tt-metal/ttnn/ttnn/_ttnn.so(+0x10c52b1) [0x7f1882a8e2b1]
 --- tt::tt_metal::tensor_impl::allocate_buffer_on_device(unsigned long, tt::tt_metal::Device*, tt::tt_metal::LegacyShape const&, tt::tt_metal::DataType, tt::tt_metal::Layout, tt::tt_metal::MemoryConfig const&, std::__1::optional<tt::tt_metal::ShardSpecBuffer> const&, std::__1::optional<tt::tt_metal::Tile> const&)
 --- /home/ubuntu/actions-runner/_work/tt-metal/tt-metal/ttnn/ttnn/_ttnn.so(_ZN2tt8tt_metal11tensor_impl25initialize_data_on_deviceI8bfloat16NS0_15borrowed_buffer6BufferEEENSt3__110shared_ptrINS0_6BufferEEERT0_IT_EPNS0_6DeviceERKNS0_11LegacyShapeENS0_8DataTypeENS0_6LayoutERKNS0_12MemoryConfigERKNS6_8optionalINS0_15ShardSpecBufferEEERKNSO_INS0_4TileEEENSO_INS6_17reference_wrapperINS0_12CommandQueueEEEEE+0x3a) [0x7f1882a8ee3a]
 --- /home/ubuntu/actions-runner/_work/tt-metal/tt-metal/ttnn/ttnn/_ttnn.so(+0x10c66ac) [0x7f1882a8f6ac]
 --- /home/ubuntu/actions-runner/_work/tt-metal/tt-metal/ttnn/ttnn/_ttnn.so(+0x10c5c59) [0x7f1882a8ec59]
 --- /home/ubuntu/actions-runner/_work/tt-metal/tt-metal/ttnn/ttnn/_ttnn.so(_ZN2tt8tt_metal11tensor_impl16to_device_bufferI8bfloat16EENSt3__110shared_ptrINS0_6BufferEEERKNS4_7variantIJNS0_12OwnedStorageENS0_13DeviceStorageENS0_15BorrowedStorageENS0_22MultiDeviceHostStorageENS0_18MultiDeviceStorageEEEEPNS0_6DeviceERKNS0_11LegacyShapeENS0_8DataTypeENS0_6LayoutERKNS0_12MemoryConfigERKNS4_8optionalINS0_15ShardSpecBufferEEERKNSR_INS0_4TileEEENSR_INS4_17reference_wrapperINS0_12CommandQueueEEEEE+0xbb) [0x7f18829e72cb]
 --- tt::tt_metal::Tensor tt::tt_metal::tensor_impl::to_device<bfloat16>(tt::tt_metal::Tensor const&, tt::tt_metal::Device*, tt::tt_metal::MemoryConfig const&, std::__1::optional<std::__1::reference_wrapper<tt::tt_metal::CommandQueue>>)
 --- auto tt::tt_metal::tensor_impl::dispatch<auto tt::tt_metal::tensor_impl::to_device_wrapper<tt::tt_metal::Tensor&, tt::tt_metal::Device*&, tt::tt_metal::MemoryConfig const&, std::__1::nullopt_t const&>(tt::tt_metal::Tensor&, tt::tt_metal::Device*&, tt::tt_metal::MemoryConfig const&, std::__1::nullopt_t const&)::'lambda'<typename $T>(auto&&...), tt::tt_metal::Tensor&, tt::tt_metal::Device*&, tt::tt_metal::MemoryConfig const&, std::__1::nullopt_t const&>(tt::tt_metal::DataType, auto tt::tt_metal::tensor_impl::to_device_wrapper<tt::tt_metal::Tensor&, tt::tt_metal::Device*&, tt::tt_metal::MemoryConfig const&, std::__1::nullopt_t const&>(tt::tt_metal::Tensor&, tt::tt_metal::Device*&, tt::tt_metal::MemoryConfig const&, std::__1::nullopt_t const&)::'lambda'<typename $T>(auto&&...)&&, auto&&...)
 --- /home/ubuntu/actions-runner/_work/tt-metal/tt-metal/ttnn/ttnn/_ttnn.so(+0x10e5aa0) [0x7f1882aaeaa0]
 --- tt::tt_metal::tensor_ops::tensor_to(tt::tt_metal::Tensor const&, tt::tt_metal::Device*, tt::tt_metal::MemoryConfig const&)
 --- tt::tt_metal::Tensor::to(tt::tt_metal::Device*, tt::tt_metal::MemoryConfig const&) const
 --- ttnn::operations::core::to_device(tt::tt_metal::Tensor const&, tt::tt_metal::Device*, std::__1::optional<tt::tt_metal::MemoryConfig> const&)
 --- /home/ubuntu/actions-runner/_work/tt-metal/tt-metal/ttnn/ttnn/_ttnn.so(+0x12f4337) [0x7f1882cbd337]
 --- /home/ubuntu/actions-runner/_work/tt-metal/tt-metal/ttnn/ttnn/_ttnn.so(+0x12f421e) [0x7f1882cbd21e]
 --- /home/ubuntu/actions-runner/_work/tt-metal/tt-metal/ttnn/ttnn/_ttnn.so(+0x125814e) [0x7f1882c2114e]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(PyCFunction_Call+0x6b) [0x7f18de80fa8b]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x5c0d) [0x7f18de87ee0d]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x301) [0x7f18de8785e1]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x18e) [0x7f18de80f4ee]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyObject_FastCallDict+0x238) [0x7f18de80e908]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyObject_Call_Prepend+0x66) [0x7f18de80fd36]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(+0x1f8aad) [0x7f18de8d6aad]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyObject_MakeTpCall+0x170) [0x7f18de80ed10]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x5136) [0x7f18de87e336]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8b2) [0x7f18de878b92]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x18e) [0x7f18de80f4ee]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(PyVectorcall_Call+0x74) [0x7f18de8100c4]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x2370) [0x7f18de87b570]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x301) [0x7f18de8785e1]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x18e) [0x7f18de80f4ee]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyObject_FastCallDict+0x238) [0x7f18de80e908]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyObject_Call_Prepend+0x66) [0x7f18de80fd36]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(+0x1f8aad) [0x7f18de8d6aad]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyObject_MakeTpCall+0x170) [0x7f18de80ed10]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x5136) [0x7f18de87e336]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8b2) [0x7f18de878b92]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x18e) [0x7f18de80f4ee]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x3b3) [0x7f18de8795b3]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x8b2) [0x7f18de878b92]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x18e) [0x7f18de80f4ee]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(PyVectorcall_Call+0x74) [0x7f18de8100c4]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x2370) [0x7f18de87b570]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyFunction_Vectorcall+0xfa) [0x7f18de80f45a]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(PyVectorcall_Call+0x74) [0x7f18de8100c4]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x2370) [0x7f18de87b570]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyFunction_Vectorcall+0xfa) [0x7f18de80f45a]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7b8) [0x7f18de8799b8]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x301) [0x7f18de8785e1]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyFunction_Vectorcall+0x18e) [0x7f18de80f4ee]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(+0x1339c8) [0x7f18de8119c8]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x1174) [0x7f18de87a374]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyFunction_Vectorcall+0xfa) [0x7f18de80f45a]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7b8) [0x7f18de8799b8]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyObject_FastCallDict+0x14a) [0x7f18de80e81a]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyObject_Call_Prepend+0x66) [0x7f18de80fd36]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(+0x16caf7) [0x7f18de84aaf7]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(+0x16a739) [0x7f18de848739]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyObject_MakeTpCall+0x8b) [0x7f18de80ec2b]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x4a29) [0x7f18de87dc29]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyFunction_Vectorcall+0xfa) [0x7f18de80f45a]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x4afc) [0x7f18de87dcfc]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyFunction_Vectorcall+0xfa) [0x7f18de80f45a]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x4afc) [0x7f18de87dcfc]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyFunction_Vectorcall+0xfa) [0x7f18de80f45a]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x7b8) [0x7f18de8799b8]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyFunction_Vectorcall+0xfa) [0x7f18de80f45a]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x3b3) [0x7f18de8795b3]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyFunction_Vectorcall+0xfa) [0x7f18de80f45a]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalFrameDefault+0x3b3) [0x7f18de8795b3]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(_PyEval_EvalCodeWithName+0x301) [0x7f18de8785e1]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(PyEval_EvalCodeEx+0x47) [0x7f18de8782c7]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(PyEval_EvalCode+0x1f) [0x7f18de8ef98f]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(+0x222778) [0x7f18de900778]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(+0x222713) [0x7f18de900713]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(+0xeaf64) [0x7f18de7c8f64]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(PyRun_SimpleFileExFlags+0x370) [0x7f18de7c8d39]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(Py_RunMain+0x50a) [0x7f18de908a1a]
 --- /home/ubuntu/actions-runner/_work/_tool/Python/3.8.18/x64/lib/libpython3.8.so.1.0(Py_BytesMain+0x3d) [0x7f18de90838d]
 --- /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf3) [0x7f18de502083]
 --- python(_start+0x2e) [0x55814985409e]

Sample vectors:

 {'concat_specs': "{'dim': -1, 'shapes': [[1, 5], [1, 1]]}", 'dtype': 'DataType.BFLOAT16', 'layout': 'Layout.ROW_MAJOR', 'validity': 'VectorValidity.VALID', 'invalid_reason': '', 'status': 'VectorStatus.CURRENT'}

Total occurrences: 1

jaykru-tt commented 1 month ago

There are similar failures with invalid page sizes; will create more issues later if they appear distinct.