tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
471 stars 74 forks source link

[Bug Report] ttnn::reshape failing to reshape tensor from `[1, 1, 1, 65536]` to `[1, 32, 32, 32]` citing L1 allocation error #15075

Open marty1885 opened 2 hours ago

marty1885 commented 2 hours ago

Describe the bug I'm attempting to eliminate most if not all CPU fallback code paths in my GGML backend. One of which is reshaping a single vector into a 3D tensor - I know this is not efficient on TT hardware and I should eliminate it completely by graph rewrite in the future - This operation fails when performed purely on device.

To Reproduce The following is the minimal reproducible example. Compile and run it:

#include <cstddef>
#include <ttnn/core.hpp>
#include <ttnn/operations/eltwise/unary/unary.hpp>
#include <ttnn/operations/creation.hpp>
#include <ttnn/device.hpp>
#include <ttnn/operations/data_movement/tilize_with_val_padding/tilize_with_val_padding.hpp>
#include "ttnn/operations/data_movement/reshape_on_device/reshape.hpp"

#include "ttnn/operations/data_movement/reshape_view/reshape.hpp"
#include "common/bfloat16.hpp"

#include <vector>
#include <iostream>

int main()
{
    auto& device = ttnn::open_device(0);
    auto t = ttnn::zeros(ttnn::SimpleShape({1, 1, 1, 65536})).to(&device);
    t = ttnn::tilize_with_zero_padding(t);

    std::cout << "Original shape: " << t.shape() << std::endl;
    auto b = ttnn::reshape(t, ttnn::SimpleShape({1, 32, 32, 32}));
    std::cout << "resulting shaoe: " << b.shape() << std::endl;
}

Observe the error:

                  Metal | INFO     | Initializing device 0. Program cache is NOT enabled
                  Metal | INFO     | AI CLK for device 0 is:   1000 MHz
                  Metal | INFO     | MMIO Device 0 : Tunnel 0 : Device 0
                  Metal | INFO     | MMIO Device 0 : Tunnel 0 : Device 4
Original shape: ttnn.Shape([1, 1, 1[32], 65536])
                 Always | FATAL    | Statically allocated circular buffers on core range [(x=0,y=0) - (x=0,y=0)] grow to 8487424 B which is beyond max L1 size of 1499136 B
terminate called after throwing an instance of 'std::runtime_error'
  what():  TT_THROW @ /home/marty/Documents/tt/tt-metal/tt_metal/impl/program/program.cpp:761: tt::exception
info:
Statically allocated circular buffers on core range [(x=0,y=0) - (x=0,y=0)] grow to 8487424 B which is beyond max L1 size of 1499136 B

Expected behavior Reshape from 1D vector should work no matter the vector size. Or at least the API/document should provide some way to indicate the algorithm cannot run if the input exceeds a certain size.

Screenshots If applicable, add screenshots to help explain your problem.

Please complete the following environment information:

Additional context Add any other context about the problem here.

ayerofieiev-tt commented 2 hours ago

Looks similar to https://github.com/tenstorrent/tt-metal/issues/15032

dmakoviichuk-tt commented 2 hours ago

@marty1885 could you pass last parameter multicore=true to the tilize_with_zero_padding please?

marty1885 commented 2 hours ago

@dmakoviichuk-tt The problem is not tilize_with_zero_padding. It's with ttnn::reshape. The line Original shape: ttnn.Shape([1, 1, 1[32], 65536]) is printed. I assume this means tilize_with_zero_padding passes?

You are right. The problem is indeed in tilize_with_zero_padding.

Update: No,with multicore=true i get the same error.

dmakoviichuk-tt commented 2 hours ago

ah :( thats sad. btw I've got second idea to make it work. Out of curiosity (we still need to fix this one )Is it possible to reshape before tilize?

marty1885 commented 1 hour ago

@dmakoviichuk-tt No unfortunately. ttnn::reshape eventually calls ttnn::to_layout and runs into the same issue.