Describe the bug
The following unit test hangs at the .cpu() call when run in fast dispatch:
pytest tests/tt_eager/python_api_testing/unit_testing/test_untilize_with_halo_v2.py::test_generate_all_configs_and_references[False-conv_params22-20-input_chw_shape22-98-grid_size22-False]
No issues with slow dispatch mode.
Performing a sharded_to_interleaved and then moving to host with .cpu() works fine too. So the sharded tensor move to host hangs.
The input tensor shape to this unit test (UTWHv2) is: [1, 1, 62720, 128] and the constructed output is the following shape:
[1, 1, 98 * 913, 128], each shard being [913, 128] sized.
Datatype is BFLOAT16
Describe the bug The following unit test hangs at the
.cpu()
call when run in fast dispatch:pytest tests/tt_eager/python_api_testing/unit_testing/test_untilize_with_halo_v2.py::test_generate_all_configs_and_references[False-conv_params22-20-input_chw_shape22-98-grid_size22-False]
No issues with slow dispatch mode.
Performing a
sharded_to_interleaved
and then moving to host with.cpu()
works fine too. So the sharded tensor move to host hangs.To Reproduce
Branch:
bliu/issue-4319
Test:pytest tests/tt_eager/python_api_testing/unit_testing/test_untilize_with_halo_v2.py::test_generate_all_configs_and_references[False-conv_params23-20-input_chw_shape23-98-grid_size23-False]
The input tensor shape to this unit test (UTWHv2) is:
[1, 1, 62720, 128]
and the constructed output is the following shape:[1, 1, 98 * 913, 128]
, each shard being[913, 128]
sized. Datatype isBFLOAT16