tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
https://docs.tenstorrent.com/ttnn/latest/index.html
Apache License 2.0
492 stars 82 forks source link

Slice Op for Multi-Device Tensors #11017

Open kpaigwar opened 4 months ago

kpaigwar commented 4 months ago

Description

In Llama3 on TG, we are doing slicing of 32 users into 4 groups please check the diagram below

Screenshot 2024-08-01 at 3 16 40 PM

We currently don't have support for Slicing multi-device tensors, as a workaround we are using a Slice Matmul to do the same. Slicing matmul takes around 4600 ns on device.

Requirement

kpaigwar commented 4 months ago

fyi @cglagovichTT @uaydonat @johanna-rock-tt