tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
303 stars 26 forks source link

OOM issue for Vanilla Unet model on WH device #8518

Open HariniMohan0102 opened 1 month ago

HariniMohan0102 commented 1 month ago

Describe the bug On running the Unet model on WH device, we face Out Of Memory (OOM) issue in fused convs.

To Reproduce Steps to reproduce the behavior:

  1. Checkout to saichand/unet_on_wh
  2. Run the command pytest tests/ttnn/integration_tests/unet/test_ttnn_unet.py

Expected behavior To test the model without any out of memory issue.

Please complete the following environment information:

Additional context We have unit test for some fused convs in the file: tests/ttnn/integration_tests/unet/unet_fused_conv_test.py where out of 16 convs, 7 of them are failing due to low pcc or AssertionError: act_block_h must evenly divide out_block_h. Debugging on unit tests is still in progress.

dvartaniansTT commented 1 week ago

@saichandax please confirm if this the vanilla UNet variant related?

vshenoyTT commented 1 week ago

To reproduce the error, follow step 4 of the installation guide and build with CMake, not Makefile. Run the test command, and make sure the virtual environment is activated. If Metal is not building properly, re-clone the repository and try again.

HariniMohan0102 commented 6 days ago

@dvartaniansTT we worked on UNet and Shallow Unet models. This issue is related to UNet model. Note: This issue is up to date and facing the issue as mentioned in description.