tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
463 stars 71 forks source link

Conv fails with OOM issue in unit tests #10511

Open HariniMohan0102 opened 3 months ago

HariniMohan0102 commented 3 months ago

Describe the bug On unit testing the conv ops of model_k, for the input resolution 256x256:

On unit testing the conv ops of model_k, for the input resolution 128x128:

To Reproduce Steps to reproduce the behavior:

  1. Checkout to the branchharini/model_k_failing_convs
  2. Run the respective commands in each input resolution to reproduce the issues.

Expected behavior To run conv op for the specific input configurations without error.

Please complete the following environment information:

dvartaniansTT commented 3 months ago

@HariniMohan0102 and @punithsekar please clearly list which convs are failing with details like, input resolution, filter size, input output channels, stride, dilation, ...

For instance, for model-k, we lowered the resolution for some convs to get them to pass. However, we still need support for the original resolution and they must be included in the unit tests with the original resolution and marked as failing.

HariniMohan0102 commented 3 months ago

@dvartaniansTT updated the issue description and Readme with the required details of failing cases. Please check.

dvartaniansTT commented 1 month ago

@HariniMohan0102 please test the dilation> 1 and confirm if they pass now.

HariniMohan0102 commented 1 month ago

@dvartaniansTT On testing, all the dilation >1 input configurations passed. Further, updated the issue description with the other failing convs (OOM issue).

dvartaniansTT commented 1 week ago

@mywoodstock and @HariniMohan0102 can we close this issue now or still valid?

HariniMohan0102 commented 1 week ago

@dvartaniansTT The issue is still valid. The conv unit test for higher resolution (256x256) still fails with OOM error. The conv unit test for lower resolution (128x128) was failing with OOM error when tested previously. Note: Yet to unit test conv cases with the latest main, especially for lower resolution.