Open cglagovich opened 6 months ago
single layer latency with 8x8 mlp matmuls: 29469281 ns
with block-sahrded eltwise mul: 29176066 ns
with 8x8 projection matmuls: 28052277 ns
with 8x8 rms @ 256 chunk size: 27283850 ns
with 8x8 rms @ 512 chunk size: 27160413 ns
1024 chunk size OOM
8x8 spda @ chunk size 256: 25084698 ns
sdpa chunk size 512 OOM
fyi @johanna-rock-tt
High level issue for prefill performance improvements.