Closed dvartaniansTT closed 1 day ago
@dvartaniansTT can you post the specific concat config that doesn't support sharding ?
There is concat in Unet and SD and I believe those are sharded.
@davorchap sure we will create unit tests and comment the branch and pytest command to run them here. cc: @punithsekar
@dvartaniansTT following up . do you have unit tests for this?
@tarafdarTT , I was not able to create a unit test for those Since, I was facing issue creating sharded input. I will check and share the info of the concat soon.
@tarafdarTT , I have created a branch related to pipeline where concat is used in yolov4. Checkout to branch, punith/yolov4_concat
I am not sure if i am using correct shard_shape and core_grid to the ttnn.create_sharded_memory_config
for output_sharded_memory_config. Can you explain how we should give the shard_shape.
It will be helpful if you help to make the pipeline pass with sharded concat so, I can understand and use it in other sub_modules where concat is used in this model.
To run the downsample1.py test file use command pytest /tests/ttnn/integration_tests/yolov4/test_ttnn_downsample1.py
.
Device: WH-n150
thanks @punithsekar I'll have a look at this today
https://github.com/tenstorrent/tt-metal/pull/12182
@punithsekar Height sharding is splitting the collapsed outer 3 dimensions across a grid of cores that you put in (the grid doesn't have to be the device grid). Since it's concatenating along width, the shard shape along your height is unchanged from the input. The width dimension changes, so it increases the shard width there. Core grid isn't the physical device core grid in that API, it's just the cores that store the shards of the output tensors, which should be the same as the input tensor shard grids.
For user friendliness we probably want to allow an memory_config=None option since all of this info can be derived from the input tensors @tarafdarTT . Especially because the actual coordinates are hard to derive as a user after you've put the input tensors through a lot of ops.
Thanks @sjameelTT , I will try for remaining sub_modules and let you know if i encounter any issues.
fyi @saichandax
concat with shared data is currently implemented in Yolov4. Thanks!
Describe the bug concat op is currently limiting the performance of all CNNs as it only takes interleaved input data. This mandates
sharded to interleaved
andinterleaved to sharded
before and after concats. and it's very costly.see attached example perf sheet from yolov4 model: (a very high priority customer model) GS_yolov4_19_07_2024.xlsx
please note, althoguh this is not conv op, it is limiting CNNs performance. Therefore, I'm adding it to CNN board.
Expected behavior concat op takes sharded data as input; elimanating the need for
sharded to interleaved
andinterleaved to sharded
before and after concat op.Please complete the following environment information: