tenstorrent / tt-forge-fe

The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their performance and efficiency.
https://docs.tenstorrent.com/tt-forge-fe/
Apache License 2.0
6 stars 1 forks source link

[Ops] Support for Max Pool 2D op (ttnn.max_pool2d) #291

Open nvukobratTT opened 5 days ago

LPanosTT commented 5 days ago

@nsmithtt @nvukobratTT Updating to the latest MLIR causes all maxpool2d tests to fail. Most of them failing because of the throw on line 125 of third_party/tt-mlir/third_party/tt-metal/src/tt-metal/ttnn/cpp/ttnn/operations/core/to_layout/to_layout_op.cpp

The ones that don't fail here have garbage output.

nsmithtt commented 5 days ago

@nsmithtt @nvukobratTT Updating to the latest MLIR causes all maxpool2d tests to fail. Most of them failing because of the throw on line 125 of third_party/tt-mlir/third_party/tt-metal/src/tt-metal/ttnn/cpp/ttnn/operations/core/to_layout/to_layout_op.cpp

The ones that don't fail here have garbage output.

Which memory config is the one at fault here? The one that's coming from the calculate parallel config?

LPanosTT commented 5 days ago

Which memory config is the one at fault here? The one that's coming from the calculate parallel config?

This is without my pre-shading hack. The data is correct after the maxpool op in all cases. The failure happens on the ToLayout call at the end of the model. And when the TT_THROW isn't hit, the data given to host is garbage.

As for the garbage output, I'm still not certain if its the ToLayout at the end, or the reshape inserted on the maxpool output that causes that.

nsmithtt commented 5 days ago

Which memory config is the one at fault here? The one that's coming from the calculate parallel config?

This is without my pre-shading hack. The data is correct after the maxpool op in all cases. The failure happens on the ToLayout call at the end of the model. And when the TT_THROW isn't hit, the data given to host is garbage.

As for the garbage output, I'm still not certain if its the ToLayout at the end, or the reshape inserted on the maxpool output that causes that.

Ok, let us know what you find.

nvukobratTT commented 2 days ago

Which memory config is the one at fault here? The one that's coming from the calculate parallel config?

This is without my pre-shading hack. The data is correct after the maxpool op in all cases. The failure happens on the ToLayout call at the end of the model. And when the TT_THROW isn't hit, the data given to host is garbage.

As for the garbage output, I'm still not certain if its the ToLayout at the end, or the reshape inserted on the maxpool output that causes that.

Yup, let's gather more details. Once we get a better understanding of what's happening, we can include someone from the MLIR team as well.

@LPanosTT with new uplift, did you saw some more drastic changes coming between previous and current MLIR version you're referencing?

LPanosTT commented 2 days ago

@LPanosTT with new uplift, did you saw some more drastic changes coming between previous and current MLIR version you're referencing?

Only that tt-metal has been uplifted.