Open AleksKnezevic opened 2 months ago
@jliangTT, can you please coordinate?
@AleksKnezevic , the team is pretty busy with a few other SD optimizations (reshard / untilize). What is the performance win on this?
It would eliminate some data movement. I would say approximately 4-5% speedup.
This would be useful for HighRes Resnet too.
@tarafdarTT does this fall under your umbrella? can you take it on?
@tarafdarTT does this fall under your umbrella? can you take it on?
@yan-zaretskiy can also take it on
Naif is currently on embedding backwards
@tarafdarTT does this fall under your umbrella? can you take it on?
@yan-zaretskiy can also take it on
Naif is currently on embedding backwards
FYI I will start the embedding backwards early next week, wrapping up reshard row-major with diff shard/page widths @AleksKnezevic needs it ASAP for stable diffusion and he's been waiting a while. @davorchap . So up to @yan-zaretskiy if he'd rather take on embedding backwards or block sharded tilize, I'm good to do either if yan has a preference.
In stable diffusion group-norm produces block sharded, row-major tensors that we feed into a matmul, so they need to be tilized. Current tilize op does not support block-sharded inputs.
Can we add support for tilize of to handle block sharded inputs? We need to be able to support the following shapes, memory_configs: