tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
462 stars 70 forks source link

conv2d circular buffer error #14557

Open twist-vector opened 4 days ago

twist-vector commented 4 days ago

Follow on to issue #14140.

Git tag: v0.52.0

If the pad size specified/requested is too large, a Circular Buffer Config Error: Circular buffer size cannot be 0 B error is thrown. The code below will fail, however if PAD_SIZE is reduced to 1 (PAD_SIZE=1) no error is thrown. It would be good to have range checking and a more informative error message on failure.

import torch
import ttnn

device_params = {"l1_small_size": 24576}
device = ttnn.open_device(device_id=0, **device_params)

BATCH_SIZE   = 1
IN_CHANNELS  = 1
OUT_CHANNELS = 1
MAT_SIZE     = 32
KERN_SIZE    = 3
PAD_SIZE     = 2
a = torch.ones((BATCH_SIZE, IN_CHANNELS, MAT_SIZE, MAT_SIZE), dtype=torch.float32)
b = torch.ones((OUT_CHANNELS, IN_CHANNELS, KERN_SIZE, KERN_SIZE), dtype=torch.float32)
input_tensor  = ttnn.from_torch(a, layout=ttnn.TILE_LAYOUT, device=device)
weight_tensor = ttnn.from_torch(b, layout=ttnn.TILE_LAYOUT, device=device)

res = ttnn.conv2d(input_tensor=input_tensor, 
                  weight_tensor=weight_tensor, 
                  device=device,
                  in_channels=IN_CHANNELS,
                  out_channels=OUT_CHANNELS,
                  batch_size=BATCH_SIZE,
                  input_height=MAT_SIZE,
                  input_width=MAT_SIZE, 
                  kernel_size=(KERN_SIZE,KERN_SIZE),
                  padding=(PAD_SIZE,PAD_SIZE),
                  stride=(1,1),
                  dilation=(1,1),
                  groups=1)
out, out_height, out_width, conv_weight_tensor, conv_bias_tensor = res

print(out_height)
print(out_width)
print(out)
print(conv_weight_tensor)
print(conv_bias_tensor)
print("")

out_torch = ttnn.to_torch(out)
print(out_torch.shape)

ttnn.close_device(device)
pavlejosipovic commented 1 day ago

Issue here originates from same set of problems that I've described in #14558. Following same set of rules resolves this case as well.