tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
410 stars 51 forks source link

T3k Llama 70B prefill and decode performance metrics #11690

Open tstescoTT opened 1 month ago

tstescoTT commented 1 month ago

Measure and report stats that users will experience with different batch sizes, input prompt context lengths, and output generation lengths. For example:

prefill:

decode:

uaydonat commented 1 month ago

@skhorasganiTT this could be a ramp up task for you to start llama.