Open jeongu-moreh opened 1 week ago
The following compute kernel just unpack a tile in CB to DST register and pack it again to CB.
namespace NAMESPACE { void MAIN { constexpr uint32_t num_tiles = get_compile_time_arg_val(0); constexpr uint32_t cb_src = tt::CB::c_in0; constexpr uint32_t cb_dst = tt::CB::c_out0; unary_op_init_common(cb_src, cb_dst); for (uint32_t i = 0; i < num_tiles; i++) { cb_wait_front(cb_src, 1); cb_reserve_back(cb_dst, 1); tile_regs_acquire(); unpack_reconfig_data_format_srca(cb_src); copy_tile_to_dst_init_short(cb_src); copy_tile(cb_src, 0, 0); tile_regs_commit(); tile_regs_wait(); pack_reconfig_data_format(cb_dst); pack_tile(0, cb_dst); tile_regs_release(); cb_pop_front(cb_src, 1); cb_push_back(cb_dst, 1); } } } // namespace NAMESPACE
If at least one CB is of uint16 type and fp32 mode is enabled, the output is different from the input. Is an uint16 type not supported in fp32 mode, or is any additional configuration logic required?
@davorchap Would you please assign an engineer regarding this query? Thanks.
The following compute kernel just unpack a tile in CB to DST register and pack it again to CB.
If at least one CB is of uint16 type and fp32 mode is enabled, the output is different from the input. Is an uint16 type not supported in fp32 mode, or is any additional configuration logic required?