tenstorrent / tt-forge-fe

The TT-Forge FE is a graph compiler designed to optimize and transform computational graphs for deep learning models, enhancing their performance and efficiency.
https://docs.tenstorrent.com/tt-forge-fe/
Apache License 2.0
20 stars 3 forks source link

Casting transpose to bfloat16 still gives float32 on output #690

Open dgolubovicTT opened 3 days ago

dgolubovicTT commented 3 days ago

Repro:

checkout: dgolubovic/repro-tensor-mismatch-due-to-bfloat16

Run: pytest -svv forge/test/mlir/test_ops.py::test_transpose[params0]

ERROR | forge.op.eval.common:compare_with_golden_pcc:245 - Tensor mismatch

What we get is tensor missmatch between framework output and compiled model. What is strange that initial graph of compiler has output df float32 even though we casted it to bfloat16:

Image

TODO: Investigate the cause of missmatch and why output dataformat is float32...

This is not currently blocker to anything but it may increase priority if we want to transfer all models to compile at bfloat16 dataformat...

dgolubovicTT commented 2 days ago

The reason why transpose gives float32 on output is because of its definition in tm.py. Namely transpose has an argument out_dtype where you can specify output type. This attribute is torch.float32 by default. This is the reason why transpose gives float32 output and causes mismatch.

We need to remove this.