Maxpool fails in Model_net

HariniMohan0102 commented 1 month ago

Describe the bug Here is the unit tests of failing maxpool op of each resolution:

When Maxpooling=True, Encoder res: 4094x510

To reproduce the issue, run the command: pytest tests/ttnn/unit_tests/operations/test_max_pool2d.py::test_model_net_max_pool_4094x510
Among 4 maxpools, 2 maxpools fails with OOM issue
2 maxpools passed with Bfloat16, but the same configurations fails in Bfloat8_b

When Maxpooling=True, Encoder res: 2047x255

To reproduce the issue, run the command: pytest tests/ttnn/unit_tests/operations/test_max_pool2d.py::test_model_net_max_pool_2047x255
Among 4 maxpools, 1 maxpool fails with OOM issue
3 maxpools passed with Bfloat16, but the same configurations fails in Bfloat8_b

To Reproduce Steps to reproduce the behavior:

Checkout to the branch harini/model_net_failing_maxpools
Run the respective commands of Ops in each input resolution to reproduce the issues.

Expected behavior To run the op for the specific input configurations without error.

Please complete the following environment information:

OS: [e.g. Ubuntu 20.04]
Device: n150

sankarmanoj-tt commented 4 weeks ago

Maxpool supports only bfloat16. So any bfloat8 inputs must be converted to bfloat16. My guess is that when this conversion happens, there isn't enough memory to fit both input and output tensors.

mywoodstock commented 4 weeks ago

Yeah, basically you can provide bfp8_b input to the maxpool op. Internally the halo op converts to RM, so BFP16. If input is already BFP16, it should need less memory.

tenstorrent / tt-metal

Maxpool fails in Model_net #11160