Closed acejkov closed 9 months ago
33 test files, I suspect most of the skip_for_wormhole_b0() decorators can be removed,
1 tests/tt_eager/python_api_testing/unit_testing/test_attn_matmul.py
2 tests/tt_eager/python_api_testing/unit_testing/test_average_pool.py
3 tests/tt_eager/python_api_testing/unit_testing/test_bert_ops.py
4 tests/tt_eager/python_api_testing/unit_testing/test_bert_sharded.py
5 tests/tt_eager/python_api_testing/unit_testing/test_concat.py
6 tests/tt_eager/python_api_testing/unit_testing/test_downsample.py
7 tests/tt_eager/python_api_testing/unit_testing/test_embedding.py
8 tests/tt_eager/python_api_testing/unit_testing/test_eps.py
9 tests/tt_eager/python_api_testing/unit_testing/test_fully_connected.py
10 tests/tt_eager/python_api_testing/unit_testing/test_groupnorm_sharded.py
11 tests/tt_eager/python_api_testing/unit_testing/test_layernorm.py
12 tests/tt_eager/python_api_testing/unit_testing/test_layernorm_sharded.py
13 tests/tt_eager/python_api_testing/unit_testing/test_max_pool.py
14 tests/tt_eager/python_api_testing/unit_testing/test_moreh_clip_grad_norm.py
15 tests/tt_eager/python_api_testing/unit_testing/test_moreh_layernorm.py
16 tests/tt_eager/python_api_testing/unit_testing/test_moreh_matmul.py
17 tests/tt_eager/python_api_testing/unit_testing/test_moreh_sum.py
18 tests/tt_eager/python_api_testing/unit_testing/test_move_sharded.py
19 tests/tt_eager/python_api_testing/unit_testing/test_nlp_concat_heads.py
20 tests/tt_eager/python_api_testing/unit_testing/test_optimized_conv.py
21 tests/tt_eager/python_api_testing/unit_testing/test_optimized_conv_multi_core.py
22 tests/tt_eager/python_api_testing/unit_testing/test_optimized_conv_v2.py
23 tests/tt_eager/python_api_testing/unit_testing/test_pow_fractional.py
24 tests/tt_eager/python_api_testing/unit_testing/test_resnet50_first_conv.py
25 tests/tt_eager/python_api_testing/unit_testing/test_resnet50_first_conv_folding_on_host.py
26 tests/tt_eager/python_api_testing/unit_testing/test_resnet50_untilize_with_halo_and_conv_v2.py
27 tests/tt_eager/python_api_testing/unit_testing/test_rmsnorm.py
28 tests/tt_eager/python_api_testing/unit_testing/test_rotate_half.py
29 tests/tt_eager/python_api_testing/unit_testing/test_sfpu_chain.py
30 tests/tt_eager/python_api_testing/unit_testing/test_single_core_fused_ops.py
31 tests/tt_eager/python_api_testing/unit_testing/test_softmax_sharded.py
32 tests/tt_eager/python_api_testing/unit_testing/test_untilize_with_halo_and_max_pool_v2.py
33 tests/tt_eager/python_api_testing/unit_testing/test_untilize_with_halo_v2.py
test_softmax_sharded requires fix on Yu's branch yugao/gs_wh_block_matmul_hang
Hope others will work as well
@muthutt all resnet tests fail with this error
Always | FATAL | No L1 bank exists for core (x=8,y=0)
FAILED
do you know if anybody is looking into this?
not sure; it maybe an issue with WHB0 n150 vs n300 which has some fewer rows on tile than other one (reclaimed cores?) Which machine did you use ? Can you try the other one as well before filing the bug (what you have is a bug - the operator support should have raised an error saying implementation for this operator (whatever it is) is not supported on this particular type of wormhole arch.
Thanks
On Fri, Feb 9, 2024 at 10:10 AM acejkov @.***> wrote:
@muthutt https://github.com/muthutt all resnet tests fail with this error
Always | FATAL | No L1 bank exists for core (x=8,y=0)
FAILED
do you know if anybody is looking into this?
— Reply to this email directly, view it on GitHub https://github.com/tenstorrent-metal/tt-metal/issues/5258#issuecomment-1936386860, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAGOCNDBBEI37WLDX6XS2ILYSZRDHAVCNFSM6AAAAABDB3FXESVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZWGM4DMOBWGA . You are receiving this because you were mentioned.Message ID: @.***>
no worries, it's a hardcoded grid size in some of the tests I'm adjusting grid size based on the device we run on and available cores
@acejkov - most of tests for simple eltwise ops should have parity on WHB0 and work with just changing PCC if anything needs changing.
@acejkov - most of tests for simple eltwise ops should have parity on WHB0 and work with just changing PCC if anything needs changing.
@muthutt, what kind of PCC changing do you expect? the PCC target or something else?
just by a few percent points; e.g. 0.99 -> 0.98 perhaps something small
On Fri, Feb 9, 2024 at 2:34 PM Milos Trajkovic @.***> wrote:
@acejkov https://github.com/acejkov - most of tests for simple eltwise ops should have parity on WHB0 and work with just changing PCC if anything needs changing.
@muthutt https://github.com/muthutt, what kind of PCC changing do you expect? the PCC target or something else?
— Reply to this email directly, view it on GitHub https://github.com/tenstorrent-metal/tt-metal/issues/5258#issuecomment-1936692596, or unsubscribe https://github.com/notifications/unsubscribe-auth/BAGOCNBXE7NNQHGINNKO27LYS2QA5AVCNFSM6AAAAABDB3FXESVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZWGY4TENJZGY . You are receiving this because you were mentioned.Message ID: @.***>
Following tests remain to be enabled: tests/tt_eager/python_api_testing/unit_testing/testmoreh*.py
tests/tt_eager/python_api_testing/unit_testing/test_move_sharded.py tests/tt_eager/python_api_testing/unit_testing/test_groupnorm_sharded.py:
tests/tt_eager/python_api_testing/unit_testing/test_optimized_conv.py tests/tt_eager/python_api_testing/unit_testing/test_optimized_conv_multi_core.py tests/tt_eager/python_api_testing/unit_testing/test_optimized_conv_v2.py tests/tt_eager/python_api_testing/unit_testing/test_resnet50_first_conv.py tests/tt_eager/python_api_testing/unit_testing/test_resnet50_first_conv_folding_on_host.py tests/tt_eager/python_api_testing/unit_testing/test_resnet50_untilize_with_halo_and_conv_v2.py tests/tt_eager/python_api_testing/unit_testing/test_untilize_with_halo_and_max_pool_v2.py tests/tt_eager/python_api_testing/unit_testing/test_untilize_with_halo_v2.py
latest merge https://github.com/tenstorrent-metal/tt-metal/pull/5425 enables following tests
tests/tt_eager/python_api_testing/unit_testing/test_optimized_conv.py tests/tt_eager/python_api_testing/unit_testing/test_optimized_conv_multi_core.py tests/tt_eager/python_api_testing/unit_testing/test_resnet50_first_conv.py tests/tt_eager/python_api_testing/unit_testing/test_resnet50_first_conv_folding_on_host.py tests/tt_eager/python_api_testing/unit_testing/test_resnet50_untilize_with_halo_and_conv_v2.py tests/tt_eager/python_api_testing/unit_testing/test_untilize_with_halo_v2.py
Remaining tests will be enabled as part of ttnn unit tests.
Describe the bug Large number of unit tests are skipped on b0:
tests/tt_eager/python_api_testing/unit_testing/test_attn_matmul.py:from models.utility_functions import comp_pcc, skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_average_pool.py:from models.utility_functions import skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_average_pool.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_bert_ops.py:from models.utility_functions import is_wormhole_b0, is_grayskull, skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_bert_ops.py:# # @skip_for_wormhole_b0("WH ND hang, see issue #4392") tests/tt_eager/python_api_testing/unit_testing/test_bert_sharded.py:from models.utility_functions import skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_bert_sharded.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_concat.py:from models.utility_functions import skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_concat.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_concat.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_concat.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_downsample.py:from models.utility_functions import skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_downsample.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_embedding.py:from models.utility_functions import skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_embedding.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_eps.py:from models.utility_functions import is_wormhole_b0, skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_fully_connected.py:from models.utility_functions import is_wormhole_b0, skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_fully_connected.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_groupnorm_sharded.py:from models.utility_functions import torch2tt_tensor, tt2torch_tensor, pad_by_zero, skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_groupnorm_sharded.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_layernorm.py:from models.utility_functions import is_wormhole_b0, skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_layernorm.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_layernorm.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_layernorm_sharded.py:from models.utility_functions import is_wormhole_b0, skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_layernorm_sharded.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_layernorm_sharded.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_layernorm_sharded.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_max_pool.py:from models.utility_functions import skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_max_pool.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_moreh_clip_grad_norm.py:from models.utility_functions import comp_allclose_and_pcc, skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_moreh_clip_grad_norm.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_moreh_clip_grad_norm.py:# @skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_moreh_layernorm.py:from models.utility_functions import comp_allclose_and_pcc, skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_moreh_layernorm.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_moreh_matmul.py:from models.utility_functions import comp_allclose_and_pcc, skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_moreh_matmul.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_moreh_matmul.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_moreh_matmul.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_moreh_matmul.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_moreh_matmul.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_moreh_sum.py:from models.utility_functions import comp_allclose_and_pcc, skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_moreh_sum.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_move_sharded.py:from models.utility_functions import skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_move_sharded.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_nlp_concat_heads.py:from models.utility_functions import is_wormhole_b0, skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_nlp_concat_heads.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_nlp_concat_heads.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_optimized_conv.py:from models.utility_functions import print_diff_argmax, is_close, comp_pcc, comp_allclose_and_pcc, skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_optimized_conv.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_optimized_conv_multi_core.py:from models.utility_functions import print_diff_argmax, is_close, comp_pcc, comp_allclose_and_pcc, skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_optimized_conv_multi_core.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_optimized_conv_v2.py:from models.utility_functions import skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_optimized_conv_v2.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_pow_fractional.py:from models.utility_functions import skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_pow_fractional.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_resnet50_first_conv.py:from models.utility_functions import is_wormhole_b0, skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_resnet50_first_conv.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_resnet50_first_conv_folding_on_host.py: skip_for_wormhole_b0, tests/tt_eager/python_api_testing/unit_testing/test_resnet50_first_conv_folding_on_host.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_resnet50_untilize_with_halo_and_conv_v2.py:from models.utility_functions import skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_resnet50_untilize_with_halo_and_conv_v2.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_rmsnorm.py:from models.utility_functions import is_wormhole_b0, skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_rmsnorm.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_rotate_half.py:from models.utility_functions import skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_rotate_half.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_sfpu_chain.py:from models.utility_functions import is_wormhole_b0, skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_sfpu_chain.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_sfpu_chain.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_single_core_fused_ops.py:from models.utility_functions import skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_untilize_with_halo_and_max_pool_v2.py:from models.utility_functions import is_wormhole_b0, skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_untilize_with_halo_and_max_pool_v2.py:@skip_for_wormhole_b0() tests/tt_eager/python_api_testing/unit_testing/test_untilize_with_halo_v2.py:from models.utility_functions import skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_untilize_with_halo_v2.py:@skip_for_wormhole_b0() skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_single_core_fused_ops.py skip_for_wormhole_b0 tests/tt_eager/python_api_testing/unit_testing/test_softmax_sharded.py
To Reproduce Run any of the aforementioned tests on wh_b0 and look at the list of skipped tests
e.g pytest -svv tests/tt_eager/python_api_testing/unit_testing/test_softmax_sharded.py
Expected behavior Test needs to run and pass on wh_b0 arch. If test is not applicable for specific arch there has to be a clear message on why specific test can't run (e.g. unsupported feature, not enough resources etc.)