tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.
Apache License 2.0
478 stars 78 forks source link

#14826: Reimplement wzerorange #15340

Closed nathan-TT closed 3 hours ago

nathan-TT commented 20 hours ago

Ticket

https://github.com/tenstorrent/tt-metal/issues/14826

Problem description

The compiler spots that wzerorange is memset, and subtitutes the latter, leading to code bloat. The original patch caused a performance regression, presumably because memset is faster than wzerorange

What's changed

1) Use the same ASM trick to hide wzerorange's memset equivalence 2) Unroll the loop 4 fold, changing 1 write per 3 insns to 1 write per 1.5 insns

Checklist