tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
https://docs.tenstorrent.com/ttnn/latest/index.html
Apache License 2.0
480 stars 78 forks source link

[Feature Request] Support tilizing with padding of FP32 tensors #14024

Open marty1885 opened 1 month ago

marty1885 commented 1 month ago

Is your feature request related to a problem? Please describe. I'm in the process of enabling WH support for my GGML backend. FP32 was emulated with BFP16 on GS due to the lack of native FP32 support. Now FP32 is enabled in the backend on WH as native support is available. However the heavily used tilize_with_zero_padding function does not support FP32. With error

Always | FATAL    | Can only tilize bfloat16 tensors
terminate called after throwing an instance of 'std::runtime_error'
  what():  TT_FATAL @ /home/marty/Documents/tt/tt-metal/ttnn/cpp/ttnn/operations/data_movement/tilize_with_val_padding/device/tilize_with_val_padding_op.cpp:19: input_tensor_a.get_dtype() == DataType::BFLOAT16

This is especially import as GGML's unit tests are mostly testing against FP32.

Describe the solution you'd like Support FP32 for tilization.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Card: Wormhole N300 CPU: EPYC 8124P OS: Arch Linux commit: 53f564d100f32a5c1fe5c8d9c626f57dd4aa78d3

ntarafdar commented 2 weeks ago

Tracked here: https://github.com/tenstorrent/tt-metal/issues/14570 @yugi957 is on it :)

marty1885 commented 1 week ago

Nice thanks! I'm actually stuck on debugging an accuracy issue on GGML. FP32 would make things much clearer.

yugi957 commented 1 week ago

@marty1885 Just finishing up adding support. I don't think the PR will be merged within tonight/tomorrow, but I can link you the branch and it's commits after for you to cherry pick!

yugi957 commented 1 week ago

@marty1885 https://github.com/tenstorrent/tt-metal/commit/688e29f846b541acbfc302caead8d6dc26bffef8

You should be able to cherry pick this commit and bypass the error, float32 seems to work natively with the previous changes.