Support for non-tile multiple shard height and width in conv op so that we can run convs on increased # of cores

tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.

Apache License 2.0

473 stars 75 forks source link

Support for non-tile multiple shard height and width in conv op so that we can run convs on increased # of cores #7770

Open tt-nshanker opened 6 months ago

tt-nshanker commented 6 months ago

Conv kernels will handle non-tile multiple input shard height by leaving garbage rows in the tilized CB and output CB. Output shard can be allocated with padded/tile aligned size if conv output is tiled layout. If conv output is RM, output tensor's shard shape will be be allocated with padded/tile multiple size but the shard shape would be set to actual non-tile multiple size (ignoring the garbage rows in the shard).

Two separate tasks:

[ ] #10109
[x] #10110

tt-nshanker commented 6 months ago

Perf optimization for SD model. @AleksKnezevic @davorchap

mywoodstock commented 5 months ago

Also needed for high-res resnet to fit 1kx1k on GS.

mywoodstock commented 5 months ago

Also needed for b=20 RN50 on WH (with new 1x1s2 conv for downsample)