tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
417 stars 53 forks source link

ttnn.sinh operation fails with low PCC in most situations #8598

Open nemanjagrujic opened 4 months ago

nemanjagrujic commented 4 months ago

ttnn.sinh operation fails with low PCC in all tested situations.

Problem is observed on both GS and WH cards.

To Reproduce Steps to reproduce the behavior:

  1. Checkout branch ngrujic/op_bug_unit_tests (soon to be merged into main).
  2. Run unit test test_eltwise_sinh.py using this command: pytest tests/ttnn/python_api_testing/non_working_unit_tests/grayskull/test_eltwise_sinh.py

Expected behavior There are multiple test cases presented in the unit test, which are failing with low PCC.

Getting Additional info for the operation under test and its behavior To get additional information and results for different combinations of input shapes, types, layouts and memory configs for which this operation was tested you can also run locally sweeps and check the results. To do this you should:

  1. Run non working sweep by using pytest tests/ttnn/python_api_testing/sweep_tests/run_sweep_test.py --input-path tests/ttnn/python_api_testing/sweep_tests/test_configs/ci_sweep_tests_broken/grayskull/ttnn_eltwise_sinh_test.yaml --input-method cli --cli-input results_ttnn_sinh
  2. After the run is completed all test sweeps results should be available inside specified output directory (in this case ./results_ttnn_sinh).
tt-aho commented 4 months ago

I added mixed precision support to bcast on latest, so it should no longer crash.

nemanjagrujic commented 4 months ago

Hello, @tt-aho, I tried with rebasing to latest main today and its still failing.

tt-aho commented 4 months ago

Hey, that was my mistake. I added support for it in the op itself but forgot to remove the assert. Should be resolved on latest main a1b9781628e0275b18d960204d279166246c841e

nemanjagrujic commented 4 months ago

Hello @tt-aho, just rebased with latest main and rebuilt, but its still the same issue. Not sure what is happening. You can always test your self by running:

pytest tests/ttnn/python_api_testing/non_working_unit_tests/grayskull/test_eltwise_sinh.py

To see if its fixed.

tt-aho commented 4 months ago

Hey, the crash/error should finally be resolved (d5c7fef9284b50bbde7444c49a158b41f0bcf0af). I've verified locally that it now only mismatches

nemanjagrujic commented 4 months ago

Confirmed. Now only PCC is failing.

KalaivaniMCW commented 2 months ago

will be revisited after TTNN migration

ruthreshx commented 1 month ago

@nemanjagrujic error only occurs due to the exp op involves. Since exp has the low range support from TT end. Hence the test fails. I could see the test passes for the range (-88, 88) instead (-100, 100). Can we change the range for sinh op?

nemanjagrujic commented 1 month ago

@nemanjagrujic error only occurs due to the exp op involves. Since exp has the low range support from TT end. Hence the test fails. I could see the test passes for the range (-88, 88) instead (-100, 100). Can we change the range for sinh op?

@ruthreshx Well, that's not a question for me. If there is valid range which is agreed on (and documented) we can change it in test.

ruthreshx commented 1 month ago

Hi @nemanjagrujic , Out of 30 collected test I could see 10 tests were passing for the range as (-88, 88). I could see the issue been raising due to bfloat_8 and row_major is been restricted. Hence the tests fails. Hence I made the condition to change those should be skipped from the test alone.

nemanjagrujic commented 1 month ago

@ruthreshx Is (-88, 88) range documented? Also is bloat8_b and row_major restriction documented?

ruthreshx commented 1 month ago

yes @nemanjagrujic range has been documented it has already been skipped in test file itself.

nemanjagrujic commented 1 month ago

@ruthreshx OK, great