Block sharded tilize - Githubissues

tenstorrent / tt-metal

:metal: TT-NN operator library, and TT-Metalium low level kernel programming model.

Apache License 2.0

303 stars 26 forks source link

Block sharded tilize #7578

Open AleksKnezevic opened 2 months ago

AleksKnezevic commented 2 months ago

In stable diffusion group-norm produces block sharded, row-major tensors that we feed into a matmul, so they need to be tilized. Current tilize op does not support block-sharded inputs.

Can we add support for tilize of to handle block sharded inputs? We need to be able to support the following shapes, memory_configs:

Shape | Grid | Shard Shape | Orientation -- | -- | -- | -- 1,1,8192,320 | 8,5 | 1024,64 | COL_MAJOR 1,1,2048,640 | 8,5 | 256,128 | COL_MAJOR 1,1,512,1280 | 8,8 | 64,160 | COL_MAJOR 1,1,128,1280 | 4,8 | 32,160 | COL_MAJOR

AleksKnezevic commented 2 months ago

@jliangTT, can you please coordinate?

jliangTT commented 2 months ago

@AleksKnezevic , the team is pretty busy with a few other SD optimizations (reshard / untilize). What is the performance win on this?

AleksKnezevic commented 2 months ago

It would eliminate some data movement. I would say approximately 4-5% speedup.

mywoodstock commented 3 weeks ago

This would be useful for HighRes Resnet too.

mywoodstock commented 3 weeks ago

@tarafdarTT does this fall under your umbrella? can you take it on?

davorchap commented 3 weeks ago

@tarafdarTT does this fall under your umbrella? can you take it on?

@yan-zaretskiy can also take it on

Naif is currently on embedding backwards

tarafdarTT commented 3 weeks ago

@tarafdarTT does this fall under your umbrella? can you take it on?

@yan-zaretskiy can also take it on

Naif is currently on embedding backwards

FYI I will start the embedding backwards early next week, wrapping up reshard row-major with diff shard/page widths @AleksKnezevic needs it ASAP for stable diffusion and he's been waiting a while. @davorchap . So up to @yan-zaretskiy if he'd rather take on embedding backwards or block sharded tilize, I'm good to do either if yan has a preference.