tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
416 stars 53 forks source link

[MobileNet] New Conv2D API error when channels are not divided by 32 and kernel_size = (1,1) #12918

Open bosGlee opened 1 week ago

bosGlee commented 1 week ago

When kernel_size is (1,1), new conv2D api works like a matmul.

Concepts between sliding ops(maxpool, conv) and matmul makes problem.

Sliding Ops usually converts tensor_shape from (1,H,W,C) to ( 1,1,H*W,C) Weights are also converted from [output_channels, input_channels // groups, filter_height, filter_width] to ( 1,1,-1,input_channels)

When sliding ops sends weights to device, it adds padding with zero when it is not divided by 32. For example, weight shape (32,3,3,3) is converted to (1,1,32 3 3 ,32). (zero padding)

It is okay when we use weights for conv2d, but not okay for matmul, 1x1 conv2d.

Inputs are usually tilized. (1,3,224,224) input feature is converted into (1,1,224*224, 3[32]). I call this tile padding.

Errors from mobileNetV2 occur because both input_channel and output_channels are not divided by 32. (24,144) Conv2D function pads with zero, but it needs tile padding.

To Reproduce Steps to reproduce the behavior:

  1. Go to /tt-metal/tests/ttnn/unit_tests/operations/test_new_conv2d.py::test_resnet50_conv_gs
  2. Add test cases. (144, 32, 32, 32, 1, 1, 1, 1, 0, 0, True, None), #shape issue, 144 are zero padded to 160 (144, 16, 32, 32, 1, 1, 1, 1, 0, 0, True, None), #sharding shape error (144, 24, 32, 32, 1, 1, 1, 1, 0, 0, True, None), # width/height issue (144, 48, 32, 32, 1, 1, 1, 1, 0, 0, True, None), # width/height issue (144, 30, 32, 32, 1, 1, 1, 1, 0, 0, True, None), # width/height issue

    (144, 32, 32, 32, 1, 1, 1, 1, 0, 0, True, None), #shape issue
    (112, 32, 32, 32, 1, 1, 1, 1, 0, 0, True, None), #shape issue
    (176, 32, 32, 32, 1, 1, 1, 1, 0, 0, True, None), #shape issue
    (196, 32, 32, 32, 1, 1, 1, 1, 0, 0, True, None), #shape issue
  3. See error

Expected behavior Kernel_size 1x1 convolution should have different logic for sending weights to device. E.g ) (144,24,1,1) weight_shape current version (1,1,160,32) expected behavior (1,1,144[160],24[32])

Please complete the following environment information:

weight_tensor3 = weight_tensor3.permute(2,3,1,0).reshape(1,1,input_weight_shape[1],-1)
self.tt_weight_tensor3 = ttnn.from_torch(weight_tensor3, ttnn.bfloat16, device=device,layout=ttnn.TILE_LAYOUT)       
bosGlee commented 1 week ago

I find two ways to fix this. First , permute the weight tensor before it is converted to ttnn.Tensor .

if filter_height ==1 and filter_width ==1 :
        torch_weight_feature = torch_weight_tensor.permute(2,3,1,0)
        tt_weight_tensor = ttnn.from_torch(
            torch_weight_feature, ttnn.bfloat16, device=device,layout=ttnn.TILE_LAYOUT
        )  
    else:
        tt_weight_tensor = ttnn.from_torch(
            torch_weight_tensor, weights_dtype if weights_dtype != ttnn.bfloat8_b else ttnn.float32
        )

This is easy to control. But it is really confusing because users should remember convolution's kernel size.

Another way is to fix "prepare_conv_weights_biases_and_move_to_device" function inside conv2d.cpp . Function "convert_conv_weight_tensor_to_special_padding_tiled_layout" and "convert_conv_weight_tensor_to_tiled_layout" pad the weight tensor, so I disable it when kernel size is 1x1.

But I can't use ttnn.permute when tensor is not on device. So the weights tensor should be permuted manually. I make function permute_2301_manally.

    if (parallel_config.shard_scheme == TensorMemoryLayout::HEIGHT_SHARDED) {
     //is it 1x1 conv?
        if (window_h != 1 and window_w != 1){
         // if not , just use one.
        weight_tensor_ = convert_conv_weight_tensor_to_special_padding_tiled_layout(
            weight_tensor_, weight_block_h_ntiles, weight_block_w_ntiles, weights_bias_dtype);
        }
        else{
       //permute 2301 to the weights
            weights_shape = weight_tensor_.get_shape();
            // permute(2,3,0,1)
            weight_tensor_ = permute_2301_manally(weight_tensor,weights_bias_dtype);
          //make them tile_layout
            weight_tensor_ = ttnn::to_layout(weight_tensor_, Layout::TILE, std::nullopt, std::nullopt, (T*)nullptr);
        }
    } else {
        weight_tensor_ = convert_conv_weight_tensor_to_tiled_layout(
            weight_tensor_, weight_block_h_ntiles, weight_block_w_ntiles, weights_bias_dtype);

I checked pcc test. (144, 24, 32, 32, 1, 1, 1, 1, 0, 0, True, None), (144, 48, 32, 32, 1, 1, 1, 1, 0, 0, True, None), (144, 30, 32, 32, 1, 1, 1, 1, 0, 0, True, None)

image


When input_channel is 16, it has a shard shape error which is not related with tile_padding. (currently sharded pad does not support pad on last dim currently as that will cause perf degradation)

mywoodstock commented 1 week ago

@bosGlee we have certain limitations in our current Tensor shape tracking. You can get past the shape assertion errors with this commit: https://github.com/tenstorrent/tt-metal/commit/6f371d2df8fab99d79653471e8f7f4474ce9c598 With this, all the test cases you listed in this ticket pass for me.

Since convolution with 1x1 stride 1 is computationally equivalent to matmul, conv2d calls matmul underneath in this case.

bosGlee commented 1 week ago

@mywoodstock It solves the test cases. I keep looking at #12141 because it may have edge cases that we miss.