Open dongjin-na opened 3 weeks ago
Note: Currently the highest priority issue among LLK related issues.
@dongjin-na I managed to solve the issue locally by adding a stall on the unpack_reconfig call. Can you please test it out on your branch to confirm the fix works. The fix is found on the rd/stall_unpack_reconfig branch for the wh-b0 submodule, so you can just check that commit out.
Dear @rdjogoTT,
I tested the script and other related implementations using the patch you provided.
Based on these results, I have the following requests and questions:
Dear @rdjogoTT,
Additionally, I'd like to inform you that I’ve updated the test kernel, which still fails even after applying the patch you provided. Could you please check if there might be an issue with the kernel implementation or if the patch might need further improvement?
Thank you in advance for your insights, and I look forward to your response.
uint8_and_bfloat16_ver2
branch.The SFPU code has been slightly modified, and an API using mul_tiles
with FPU has been added.
cb_wait_front(cb_in2, onetile);
binary_op_init_common(cb_in0, cb_intermed0, cb_out0);
for (uint32_t i = 0; i < num_tiles; ++i) {
uint32_t input_tile_id{start_id + i};
tile_regs_acquire();
// dst1 = uint8 type input2
cb_wait_front(cb_in1, onetile);
unpack_reconfig_data_format_srca(cb_in1);
copy_tile_to_dst_init_short(cb_in1);
copy_tile(cb_in1, 0, 1);
// SFPU function: write values 0.0 or 1.0 to dst2
simple_tile_init();
simple_tile(0, 0);
// inline void calculate_simple_tile(uint bit_index) {
// #pragma GCC unroll 0
// for (int d = 0; d < ITERATIONS; d++) {
// vUInt mask = dst_reg[32];
// v_if (mask == 0) {
// dst_reg[64] = vConst0;
// }
// v_else { dst_reg[64] = vConst1; }
// v_endif;
// dst_reg++;
// }
// }
cb_pop_front(cb_in1, onetile);
tile_regs_commit();
// intermed0 = pack from dst2 to intermed0
tile_regs_wait();
cb_reserve_back(cb_intermed0, onetile);
pack_reconfig_data_format(cb_intermed0);
pack_tile(2, cb_intermed0);
cb_push_back(cb_intermed0, onetile);
tile_regs_release();
// output (dst0) = input x intermed0
tile_regs_acquire();
cb_wait_front(cb_in0, onetile);
cb_wait_front(cb_intermed0, onetile);
unpack_reconfig_data_format(cb_intermed0, cb_in0);
mul_tiles_init_with_dt(cb_intermed0, cb_in0);
mul_tiles(cb_intermed0, cb_in0, 0, 0, 0);
cb_pop_front(cb_intermed0, onetile);
cb_pop_front(cb_in0, onetile);
tile_regs_commit();
// intermed0 = pack from dst0 to intermed0 again
tile_regs_wait();
cb_reserve_back(cb_intermed0, onetile);
pack_reconfig_data_format(cb_intermed0);
pack_tile(0, cb_intermed0);
cb_push_back(cb_intermed0, onetile);
tile_regs_release();
// output (dst0) = intermed0 x 1.0f;
tile_regs_acquire();
cb_wait_front(cb_intermed0, onetile);
unpack_reconfig_data_format(cb_intermed0, cb_in2);
mul_tiles_bcast_scalar_init_short(cb_intermed0, cb_in2);
mul_tiles_bcast_scalar(cb_intermed0, cb_in2, 0, 0, 0);
cb_pop_front(cb_intermed0, onetile);
tile_regs_commit();
// pack output tile from dst0
tile_regs_wait();
cb_reserve_back(cb_out0, onetile);
pack_reconfig_data_format(cb_out0);
pack_tile(0, cb_out0);
cb_push_back(cb_out0, onetile);
tile_regs_release();
}
2024-09-03 14:58:59.125 | DEBUG | tests.tt_eager.python_api_testing.unit_testing.misc.test_moreh_test2:test_uint8_and_bfloat16:85 - tensor([[1.08, 1.41, 1.84, 1.80, 1.13, 1.47, 1.44, 1.69, 1.95, 1.56, 1.80, 1.52, 1.50, 1.61, 1.05, 1.48, 1.27, 1.28, 1.44, 1.63, 1.67, 1.57, 1.36, 2.00, 1.67, 1.21, 1.99, 1.32, 1.57, 1.40, 1.08, 1.26],
[1.83, 1.11, 1.57, 1.75, 1.73, 1.91, 1.71, 1.41, 1.90, 1.02, 1.68, 1.74, 1.27, 1.43, 1.62, 1.62, 1.31, 1.80, 1.05, 1.80, 1.46, 1.41, 1.66, 1.91, 1.49, 1.77, 1.55, 1.38, 1.23, 1.43, 1.09, 1.52],
[1.77, 1.61, 1.40, 1.52, 1.09, 1.11, 1.12, 1.03, 1.16, 1.45, 1.99, 1.84, 1.65, 1.88, 1.70, 1.43, 1.30, 1.72, 1.66, 1.41, 1.26, 1.12, 1.42, 1.70, 1.53, 1.47, 1.41, 1.70, 1.78, 1.58, 1.12, 1.27],
[1.30, 1.06, 1.83, 1.38, 1.81, 1.91, 1.05, 1.66, 1.62, 1.01, 1.62, 1.42, 1.25, 1.82, 1.88, 1.55, 1.64, 1.27, 1.00, 1.96, 1.06, 1.53, 1.45, 1.16, 1.06, 1.07, 1.47, 1.12, 1.09, 1.16, 1.49, 1.86],
[1.81, 1.41, 1.80, 1.12, 1.38, 1.62, 1.96, 1.52, 1.77, 1.44, 1.08, 1.94, 1.02, 1.51, 1.82, 1.59, 1.70, 1.41, 1.38, 1.09, 1.57, 1.93, 1.68, 1.77, 1.52, 1.45, 1.02, 1.83, 1.00, 1.17, 1.82, 1.23],
[1.52, 1.55, 1.20, 1.15, 1.52, 1.97, 1.49, 1.46, 1.76, 1.82, 1.59, 1.07, 1.95, 1.03, 1.20, 1.76, 1.41, 1.46, 1.35, 1.71, 1.88, 1.34, 1.37, 1.01, 1.78, 1.93, 1.53, 1.29, 1.19, 1.96, 1.05, 1.23],
[1.86, 1.13, 1.12, 1.82, 1.03, 1.70, 1.18, 1.78, 1.20, 1.54, 1.84, 1.63, 1.84, 1.05, 1.58, 1.90, 1.55, 1.68, 1.94, 1.94, 1.73, 1.21, 1.08, 1.12, 1.66, 1.20, 1.06, 1.09, 1.98, 1.89, 1.96, 1.16],
[1.97, 1.60, 1.48, 1.47, 1.90, 1.38, 1.20, 1.30, 1.51, 1.12, 1.12, 1.34, 1.86, 1.27, 1.77, 1.33, 1.14, 1.03, 1.33, 1.34, 1.36, 1.53, 1.34, 1.02, 1.27, 1.40, 1.33, 1.89, 1.05, 1.86, 1.59, 1.70],
[1.11, 1.38, 1.04, 1.73, 1.05, 1.16, 1.88, 1.08, 1.65, 1.15, 1.93, 1.12, 1.52, 1.68, 1.09, 1.40, 1.02, 1.04, 1.72, 1.40, 1.09, 1.13, 1.00, 1.25, 1.02, 1.15, 1.70, 1.88, 1.48, 1.96, 1.86, 1.58],
[1.38, 1.39, 1.17, 1.43, 1.23, 1.70, 1.91, 1.06, 1.42, 1.16, 1.53, 1.91, 1.84, 1.87, 1.98, 1.74, 1.23, 1.16, 1.52, 1.30, 1.99, 1.84, 1.40, 1.28, 1.75, 1.05, 1.01, 1.72, 1.30, 1.84, 1.20, 1.16],
[1.50, 1.06, 1.39, 1.13, 1.76, 1.76, 1.82, 1.77, 1.32, 1.15, 1.66, 1.94, 1.37, 1.29, 1.41, 1.97, 1.97, 1.08, 1.75, 1.86, 1.77, 1.74, 1.73, 1.77, 1.57, 1.25, 1.80, 1.27, 1.32, 1.96, 1.04, 1.38],
[1.10, 1.02, 1.66, 1.26, 1.50, 1.85, 1.31, 1.63, 1.07, 1.73, 1.36, 1.58, 1.46, 1.82, 1.55, 1.23, 1.12, 1.59, 1.59, 1.27, 1.54, 1.86, 1.53, 1.38, 1.02, 1.26, 1.95, 1.72, 1.60, 1.63, 1.16, 1.98],
[1.44, 1.70, 1.16, 1.03, 1.05, 1.82, 1.07, 1.53, 1.74, 1.95, 1.21, 1.59, 1.45, 1.17, 1.25, 1.20, 1.59, 1.31, 1.67, 1.37, 1.91, 1.49, 1.89, 1.30, 1.05, 1.92, 1.73, 1.47, 1.33, 1.18, 1.43, 1.06],
[1.42, 1.95, 1.14, 1.44, 1.05, 1.15, 1.78, 1.98, 1.27, 1.05, 1.87, 1.46, 1.12, 1.34, 1.61, 1.49, 1.16, 1.73, 1.38, 1.75, 1.59, 1.20, 1.03, 1.92, 1.58, 1.41, 1.31, 1.21, 1.76, 1.60, 1.47, 1.88],
[1.05, 1.20, 1.59, 1.10, 1.34, 1.34, 1.66, 1.96, 1.02, 1.50, 1.46, 1.05, 1.24, 1.31, 1.58, 1.92, 1.86, 1.17, 1.22, 1.62, 1.60, 1.94, 1.67, 1.44, 1.66, 1.61, 1.73, 1.58, 1.72, 1.45, 1.15, 1.83],
[1.88, 1.99, 1.08, 1.49, 1.58, 1.06, 1.84, 1.61, 1.95, 1.64, 1.64, 1.97, 1.12, 1.71, 1.60, 1.69, 1.16, 1.02, 1.13, 1.19, 1.88, 1.20, 1.16, 1.73, 1.41, 1.02, 1.58, 1.63, 1.59, 1.64, 1.17, 1.73],
[1.12, 1.48, 1.44, 1.66, 1.51, 1.62, 1.03, 1.84, 1.45, 1.02, 1.16, 1.70, 1.30, 1.30, 1.38, 1.48, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
[1.48, 1.78, 1.36, 1.38, 1.09, 1.09, 1.10, 1.09, 1.98, 1.97, 1.03, 2.00, 1.42, 1.58, 1.58, 1.21, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
[1.30, 1.94, 1.91, 1.47, 1.54, 1.56, 1.64, 1.79, 1.00, 1.20, 1.96, 1.59, 1.11, 1.85, 1.10, 1.27, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
[1.20, 1.16, 1.55, 1.09, 1.70, 1.39, 1.52, 1.48, 1.41, 1.12, 1.05, 1.98, 1.01, 1.41, 1.83, 1.98, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
[1.15, 1.95, 1.84, 1.88, 1.79, 1.09, 1.97, 1.38, 1.20, 1.74, 1.14, 1.62, 1.80, 1.15, 1.79, 1.41, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
[1.19, 1.77, 1.05, 1.55, 1.86, 1.42, 1.73, 1.70, 1.07, 1.05, 1.56, 1.48, 1.11, 1.60, 1.38, 1.12, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
[1.85, 1.54, 1.71, 1.31, 1.20, 1.26, 1.32, 1.81, 1.59, 1.69, 1.07, 1.70, 1.00, 1.83, 1.22, 1.16, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
[1.32, 1.38, 1.30, 1.34, 1.79, 1.09, 1.81, 1.72, 1.16, 1.05, 1.22, 1.50, 1.91, 1.12, 1.09, 1.82, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
[1.77, 1.49, 1.77, 1.58, 1.53, 1.81, 1.62, 1.26, 1.49, 1.62, 1.55, 1.37, 1.95, 1.09, 1.66, 1.39, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
[1.52, 1.88, 1.47, 1.92, 1.35, 1.25, 1.98, 1.17, 1.60, 1.57, 1.47, 1.51, 1.53, 1.73, 1.02, 1.02, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
[1.88, 1.77, 1.77, 1.01, 1.55, 1.77, 1.88, 1.83, 1.38, 1.12, 1.62, 1.27, 1.23, 1.81, 1.40, 1.24, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
[1.09, 1.96, 1.59, 1.97, 1.51, 1.67, 1.83, 1.22, 1.95, 1.95, 1.18, 1.47, 1.78, 1.97, 1.29, 1.29, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
[1.34, 1.88, 1.13, 1.02, 1.84, 1.39, 1.98, 1.45, 1.98, 1.29, 1.38, 1.18, 1.32, 1.59, 1.79, 1.85, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
[1.59, 1.55, 1.05, 1.28, 1.51, 1.80, 1.18, 1.24, 1.26, 1.48, 1.52, 1.05, 1.53, 1.19, 1.32, 1.99, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
[1.03, 1.80, 1.22, 1.30, 1.62, 1.33, 1.16, 1.36, 1.25, 1.97, 1.75, 1.91, 1.23, 1.45, 1.11, 1.63, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00],
[1.10, 1.37, 1.02, 1.70, 1.73, 1.61, 1.88, 1.44, 1.66, 1.65, 1.09, 1.52, 1.24, 1.69, 1.31, 1.20, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00, 0.00]], dtype=torch.bfloat16)
2024-09-03 14:58:59.130 | DEBUG | tests.tt_eager.python_api_testing.unit_testing.misc.test_moreh_test2:test_uint8_and_bfloat16:84 - Tensor 2:
2024-09-03 14:58:59.130 | DEBUG | tests.tt_eager.python_api_testing.unit_testing.misc.test_moreh_test2:test_uint8_and_bfloat16:85 - tensor([[1.44, 1.28, 1.20, 1.90, 1.56, 1.48, 1.66, 1.21, 1.78, 1.52, 1.03, 1.78, 1.87, 1.55, 1.80, 1.84, 1.55, 1.80, 1.07, 1.07, 1.38, 1.95, 1.54, 1.76, 1.74, 1.47, 1.76, 1.88, 1.88, 1.38, 1.96, 1.27],
[1.70, 1.99, 1.39, 1.38, 1.19, 1.34, 1.80, 1.64, 1.44, 1.43, 1.07, 1.41, 1.15, 1.20, 1.95, 1.73, 1.13, 1.98, 1.16, 1.20, 1.41, 1.10, 1.72, 1.64, 1.74, 1.71, 1.20, 1.67, 1.22, 1.73, 1.53, 1.51],
[1.80, 1.02, 1.65, 1.95, 1.05, 1.63, 1.81, 1.24, 1.85, 1.09, 1.73, 1.77, 1.15, 1.39, 1.86, 1.36, 1.81, 1.02, 1.87, 1.80, 1.85, 1.78, 1.95, 1.11, 1.91, 1.45, 1.07, 1.23, 1.43, 1.41, 1.37, 1.16],
[1.76, 1.76, 1.48, 1.84, 1.28, 1.91, 1.06, 1.61, 1.60, 1.02, 1.60, 1.10, 1.35, 1.89, 1.34, 1.40, 1.34, 1.28, 2.00, 1.30, 1.59, 1.10, 1.68, 1.62, 1.91, 1.12, 1.98, 1.93, 1.77, 1.07, 1.80, 1.52],
[1.44, 1.30, 1.41, 1.81, 1.02, 1.88, 1.05, 1.52, 1.31, 1.91, 1.58, 1.24, 1.88, 1.48, 1.07, 1.33, 1.48, 1.29, 1.16, 1.26, 1.07, 1.75, 1.25, 1.05, 1.48, 1.62, 1.23, 1.02, 1.13, 1.14, 1.09, 1.61],
[1.05, 1.27, 1.70, 1.09, 1.77, 1.74, 1.46, 1.34, 1.88, 1.90, 1.60, 1.29, 1.28, 1.13, 1.05, 1.41, 1.02, 1.49, 1.55, 1.81, 1.55, 1.75, 1.48, 1.83, 1.77, 1.52, 1.51, 1.69, 1.68, 1.27, 1.95, 1.95],
[1.28, 1.72, 1.70, 1.98, 1.89, 1.90, 1.40, 1.86, 1.05, 1.56, 1.86, 1.58, 1.05, 1.10, 1.38, 1.49, 1.31, 1.95, 1.12, 1.70, 1.35, 1.34, 1.80, 1.95, 1.77, 2.00, 1.42, 1.23, 1.13, 1.95, 1.23, 1.55],
[1.09, 1.62, 1.48, 1.83, 1.52, 1.69, 1.12, 1.16, 1.46, 1.16, 1.36, 1.95, 1.93, 1.80, 1.88, 1.17, 1.91, 1.09, 1.30, 1.43, 1.22, 1.46, 1.91, 1.11, 1.77, 1.03, 1.40, 1.02, 1.70, 1.09, 1.49, 1.05],
[1.96, 1.93, 1.30, 1.32, 1.67, 1.19, 1.77, 1.05, 1.49, 1.98, 1.22, 1.24, 1.55, 1.98, 1.19, 1.29, 1.71, 1.16, 1.55, 1.74, 1.49, 1.13, 1.62, 1.48, 1.86, 1.62, 1.00, 1.14, 1.70, 1.51, 1.85, 1.31],
[1.00, 1.48, 1.88, 1.93, 1.56, 1.21, 1.29, 1.23, 1.63, 1.20, 1.20, 1.55, 1.86, 1.50, 1.33, 2.00, 1.80, 1.83, 1.62, 1.95, 1.81, 1.97, 1.62, 1.30, 1.08, 1.49, 1.32, 1.49, 1.77, 1.62, 1.85, 1.38],
[1.42, 1.06, 1.99, 1.48, 1.26, 1.77, 1.73, 1.20, 1.71, 1.61, 1.60, 1.70, 1.17, 1.20, 1.23, 1.45, 1.25, 1.98, 1.23, 1.45, 1.52, 1.39, 1.92, 1.80, 1.02, 1.87, 1.38, 1.38, 1.20, 1.08, 1.95, 1.09],
[1.98, 1.86, 1.68, 1.92, 1.41, 1.30, 1.41, 1.07, 1.88, 1.49, 1.68, 1.31, 1.86, 1.51, 1.70, 1.16, 1.25, 1.29, 1.41, 1.10, 1.64, 1.52, 1.98, 1.62, 1.26, 1.66, 1.05, 1.59, 1.43, 1.20, 1.49, 1.73],
[1.20, 1.68, 1.86, 1.73, 1.05, 1.80, 1.14, 1.42, 2.00, 1.54, 1.03, 1.99, 1.14, 1.47, 1.50, 1.53, 1.70, 1.49, 1.96, 1.17, 1.85, 1.43, 1.95, 1.16, 1.92, 1.61, 1.14, 1.62, 1.33, 1.66, 1.04, 1.85],
[1.52, 1.61, 1.48, 1.34, 1.82, 1.41, 1.38, 1.38, 1.84, 1.45, 1.72, 1.62, 1.91, 1.30, 1.24, 1.12, 1.08, 1.03, 1.36, 1.91, 1.10, 1.09, 1.49, 1.42, 1.75, 1.80, 1.30, 1.73, 1.99, 1.12, 1.71, 1.46],
[1.81, 1.88, 1.86, 1.87, 1.93, 1.74, 1.33, 1.51, 1.38, 1.80, 1.30, 1.60, 1.69, 1.45, 1.87, 1.45, 1.48, 1.19, 1.57, 1.24, 1.45, 1.55, 1.31, 1.25, 1.62, 1.62, 1.47, 1.50, 1.83, 1.56, 1.80, 1.01],
[1.74, 1.78, 1.42, 1.45, 1.15, 1.61, 1.88, 1.22, 1.13, 1.92, 1.16, 1.09, 1.59, 1.87, 1.58, 1.26, 1.96, 1.32, 1.91, 1.91, 1.43, 1.22, 1.61, 1.88, 1.99, 1.45, 1.45, 1.57, 1.16, 1.02, 1.09, 1.62],
[1.89, 1.84, 1.12, 1.05, 1.20, 1.45, 1.10, 1.14, 1.84, 1.62, 1.39, 1.70, 1.40, 1.24, 1.56, 1.31, 1.84, 1.25, 1.06, 1.98, 1.84, 1.85, 1.50, 1.51, 1.70, 1.39, 1.77, 1.31, 1.66, 1.90, 1.39, 1.75],
[1.16, 1.98, 1.49, 1.79, 1.51, 1.98, 1.02, 1.09, 1.23, 1.20, 1.47, 1.14, 1.70, 1.72, 1.66, 1.35, 1.44, 1.36, 1.16, 1.40, 1.91, 1.99, 1.77, 1.93, 1.62, 1.92, 1.73, 1.97, 1.41, 1.55, 1.21, 1.30],
[1.47, 1.87, 1.18, 1.08, 1.90, 1.10, 1.47, 1.60, 1.70, 1.30, 1.70, 1.16, 1.51, 1.23, 1.28, 1.49, 1.12, 1.76, 1.60, 1.53, 1.52, 1.76, 1.70, 1.27, 1.07, 1.16, 1.58, 1.34, 1.42, 1.73, 1.88, 1.05],
[1.35, 1.52, 1.52, 1.99, 1.37, 1.55, 1.21, 1.15, 1.81, 1.76, 1.57, 1.06, 1.59, 1.23, 1.50, 1.58, 1.88, 1.19, 1.45, 1.55, 1.48, 1.26, 1.34, 1.63, 1.63, 1.94, 1.81, 1.60, 1.84, 1.88, 1.59, 1.98],
[1.33, 1.52, 1.78, 1.54, 1.38, 1.25, 1.41, 1.40, 1.35, 1.49, 1.90, 1.85, 1.23, 1.95, 1.06, 1.41, 1.54, 1.73, 1.45, 1.72, 1.66, 1.45, 1.98, 1.55, 1.39, 1.35, 1.81, 1.47, 1.34, 1.20, 1.91, 1.30],
[1.28, 1.96, 1.99, 1.58, 1.15, 1.72, 1.34, 1.76, 1.34, 1.65, 1.77, 1.45, 1.75, 1.62, 1.77, 1.66, 1.53, 1.84, 1.09, 1.98, 1.52, 1.33, 1.31, 1.29, 1.90, 1.79, 1.53, 1.26, 1.52, 1.62, 1.70, 1.88],
[1.73, 1.45, 1.16, 1.45, 1.06, 1.94, 1.59, 1.25, 1.02, 1.52, 1.16, 1.89, 1.00, 1.14, 1.25, 2.00, 1.26, 1.68, 1.92, 1.20, 1.01, 1.28, 1.28, 1.15, 1.43, 1.67, 1.12, 1.80, 1.18, 1.66, 1.84, 1.87],
[1.55, 1.07, 1.75, 1.64, 1.55, 1.14, 1.94, 1.09, 1.10, 1.42, 1.20, 1.27, 1.87, 1.52, 1.09, 1.30, 1.32, 1.52, 1.14, 1.76, 1.78, 1.20, 1.98, 1.49, 1.13, 1.32, 1.84, 1.94, 1.21, 1.77, 1.38, 1.23],
[1.26, 1.19, 1.62, 1.02, 1.86, 1.67, 1.04, 1.71, 1.16, 1.81, 1.55, 1.60, 1.33, 1.09, 1.21, 1.91, 1.88, 1.23, 1.13, 1.11, 1.81, 1.21, 1.55, 1.55, 1.57, 1.21, 1.05, 1.97, 1.21, 1.31, 1.34, 1.34],
[1.13, 1.81, 1.69, 1.15, 1.57, 1.80, 1.85, 1.92, 1.02, 1.42, 1.35, 1.09, 1.05, 1.24, 1.48, 1.48, 1.84, 1.51, 1.76, 1.80, 1.40, 1.99, 1.49, 1.96, 1.39, 1.41, 1.79, 1.37, 1.99, 1.34, 1.30, 1.12],
[1.04, 1.10, 1.27, 1.45, 1.95, 1.01, 1.74, 1.66, 1.54, 1.80, 1.66, 1.71, 1.25, 1.55, 1.16, 2.00, 1.40, 1.65, 1.52, 1.18, 1.45, 1.91, 1.85, 1.69, 1.75, 1.12, 1.55, 1.84, 1.12, 1.36, 1.70, 1.04],
[1.10, 1.52, 1.95, 1.95, 1.63, 1.09, 1.38, 1.70, 1.30, 1.84, 1.82, 1.80, 1.91, 1.21, 1.56, 1.69, 1.90, 1.50, 1.35, 1.64, 1.08, 1.34, 1.19, 1.33, 1.31, 1.46, 1.62, 1.33, 1.26, 1.24, 1.79, 1.27],
[1.91, 1.77, 1.53, 1.91, 1.36, 1.59, 1.44, 1.17, 1.00, 1.42, 1.12, 1.13, 1.91, 1.79, 1.06, 1.86, 1.03, 1.90, 1.68, 1.71, 1.10, 1.41, 1.27, 1.07, 1.41, 1.80, 1.91, 1.26, 1.91, 1.77, 1.68, 1.48],
[1.59, 1.16, 1.13, 1.11, 1.94, 1.55, 1.75, 1.98, 1.09, 1.17, 1.53, 1.63, 1.16, 1.47, 1.33, 1.06, 1.20, 1.65, 1.22, 1.73, 1.74, 1.45, 1.27, 1.72, 1.57, 1.59, 1.27, 1.70, 1.75, 1.62, 1.37, 1.36],
[1.75, 1.15, 1.38, 1.73, 1.57, 1.74, 1.26, 1.52, 1.31, 1.67, 1.57, 1.81, 1.83, 1.50, 1.75, 1.30, 1.05, 1.48, 1.88, 1.73, 1.70, 1.10, 1.12, 1.45, 1.08, 1.98, 1.82, 1.66, 1.62, 1.59, 1.98, 1.42],
[1.39, 1.30, 1.27, 1.73, 1.77, 1.69, 1.02, 1.70, 1.13, 1.41, 1.09, 1.84, 1.10, 1.20, 1.80, 1.93, 1.82, 1.65, 1.89, 1.14, 1.16, 1.18, 1.92, 1.24, 1.44, 1.91, 1.03, 1.79, 1.59, 1.55, 1.98, 1.48]], dtype=torch.bfloat16)
@dongjin-na I did expect the patch to be a complete solution, I will investigate this failing case now.
Progress update: Adding a UNPACK(( tensix_sync() ));
at the end of the loop fixes the issue with the 4th face being all 0's. I am investigating deeper as to why, as this is a more extreme measure than should be necessary.
Looks like we've found the race, working formulating a solution and on a PR. It's a bit of a bigger one and will require substantial testing
@dongjin-na The fix is on the updated rd/stall_unpack_reconfig branch. Please try to run the test again to see if it works for you, I was able to get all tests to pass locally.
@rdjogoTT Hello, I have confirmed that the previously failing tests now pass with the provided patch. However, I found a new issue related to this in another use case. Since this issue became too lengthy, I have separated it into a new issue. Please refer to https://github.com/tenstorrent/tt-metal/issues/12963. Thank you.
Describe the bug We are experiencing issues with using different types of CBs (bfloat16 and uint8) in the compute kernel. When processing a single tile in a loop, everything works as expected, but when iterating and handling two or more tiles, specific faces in the resulting tiles are incorrectly stored as 0.
The strange part is that the issue doesn’t occur when I add a dprint in the reader. Because of this, I suspect something is missing in the kernel or host code that I wrote. Could you please help check for any potential issues?
To Reproduce Steps to reproduce the behavior: This was detected in this bug https://github.com/tenstorrent/tt-metal/issues/11000 and this example can be used.
Expected behavior The expected output should be produced.