tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
441 stars 63 forks source link

Issue with Packing Only 7 Bits in UINT8 Data Type CB #10526

Closed dongjin-na closed 2 months ago

dongjin-na commented 2 months ago

Describe the bug The typecast operation in ttnn supports typecasting between various data types. It is possible to typecast from uint32 to uint8 using the pack_tile() function.

I have observed that when packing a UINT32 data type CB id as a UINT8 CB using the second parameter of the unary_op_init_common API within the compute kernel, only up to 7 bits are being packed.

https://github.com/tenstorrent/tt-metal/blob/115107968b46ce686650a6b8371fbf6aca9b235a/tt_metal/include/compute_kernel_api/eltwise_unary/eltwise_unary.h#L20

(The reason for passing a UINT32 data type CB id as the second parameter is that some operations we implemented output two CBs with data types UINT32 and UINT8. We passed the UINT32 CB id among the two and discovered the issue in the process.)

7b_11 drawio

7b_22 drawio

I have added test code in the pack_uint32_to_uint8 branch by slightly modifying ttnn’s typecast code. When operating correctly, a UINT32 CB filled with 255 should typecast to a UINT8 CB also filled with 255. However, in the problematic scenario, it typecasts to 127.

Please confirm if this behavior is a bug or if there is an issue with how the kernel was written.

To Reproduce Steps to reproduce the behavior:

  1. git checkout origin/pack_uint32_to_uint8
  2. CONFIG=Debug ./build_metal.sh
  3. source ./python_env/bin/activate
  4. export TT_METAL_DPRINT_CORES=0,0
  5. pytest tests/ttnn/unit_tests/operations/test_typecast.py

Expected behavior Test case should be passed.

Screenshots

void MAIN { uint32_t per_core_block_cnt = get_compile_time_arg_val(0); uint32_t per_core_block_dim = get_compile_time_arg_val(1);

if UINT8_PACKING_ISSUE

unary_op_init_common(tt::CB::c_in0, tt::CB::c_intermed0);

else

unary_op_init_common(tt::CB::c_in0, tt::CB::c_out0);

endif

for (uint32_t block_index = 0; block_index < per_core_block_cnt; block_index++) {
    cb_reserve_back(tt::CB::c_out0, per_core_block_dim);
    for(uint32_t tile_index = 0; tile_index < per_core_block_dim; ++tile_index) {
        acquire_dst(tt::DstMode::Half);

        // Pop tile after tile, copy to DST and pack
        cb_wait_front(tt::CB::c_in0, 1);
        copy_tile(tt::CB::c_in0, 0, 0);

        // Current typecast doesn't support from UINT32 to UINT8.
        // #ifdef SFPU_OP_CHAIN_0
        // SFPU_OP_CHAIN_0
        // #endif

        pack_reconfig_data_format(tt::CB::c_out0);
        pack_tile(0, tt::CB::c_out0);
        print_8bits();

        cb_pop_front(tt::CB::c_in0, 1);
        release_dst(tt::DstMode::Half);
    }
    cb_push_back(tt::CB::c_out0, per_core_block_dim);
}

}


- When UINT8_PACKING_ISSUE macro is 0,
```bash
Print the value of the first four elements of the uint8 output CB in binary format.
1111 1111
1111 1111
1111 1111
1111 1111
2024-07-22 15:29:33.576 | DEBUG    | tests.ttnn.unit_tests.operations.test_typecast:test_typecast_output_tensor:34 - From torch_input_tensor (uint32 data type) tensor([[255, 255, 255,  ..., 255, 255, 255],
        [255, 255, 255,  ..., 255, 255, 255],
        [255, 255, 255,  ..., 255, 255, 255],
        ...,
        [255, 255, 255,  ..., 255, 255, 255],
        [255, 255, 255,  ..., 255, 255, 255],
        [255, 255, 255,  ..., 255, 255, 255]])
2024-07-22 15:29:33.577 | DEBUG    | tests.ttnn.unit_tests.operations.test_typecast:test_typecast_output_tensor:35 - To torch_output_ttnn (uint8 data type tensor([[255, 255, 255,  ..., 255, 255, 255],
        [255, 255, 255,  ..., 255, 255, 255],
        [255, 255, 255,  ..., 255, 255, 255],
        ...,
        [255, 255, 255,  ..., 255, 255, 255],
        [255, 255, 255,  ..., 255, 255, 255],
        [255, 255, 255,  ..., 255, 255, 255]], dtype=torch.uint8)
PASSED

Please complete the following environment information: OS: Ubuntu 20.04 / 172.27.44.100 VM

razorback3 commented 2 months ago

Faced this issue while implementing Dropout

rdjogoTT commented 2 months ago

It looks like the issue was with the pack_reconfig_data_format function, where it was not reconfiguring all the necessary packer settings for uint8. I am just waiting on CI pipelines, which are a bit backed up, before I merge the fix https://github.com/tenstorrent/tt-metal/pull/11467.

rdjogoTT commented 2 months ago

Fix is in main: https://github.com/tenstorrent/tt-metal/pull/11467, give it a shot

dongjin-na commented 2 months ago

Dear @rdjogoTT, I have checked this test on the latest main branch, which has passed. I appreciate your support.