Closed MaheshRavishankar closed 4 months ago
cc: @RattataKing
Cumulatively, measured 1.9ms speedup (55.5ms - 53.6ms) ==> 3.4%.
No | Done? | shape | ms | Improvement [ms] | Commit |
---|---|---|---|---|---|
1 | Yes | conv_2d_nhwc_hwcf_2x32x32x1280x3x3x1280_i8xi8xi32 | 1.023439 | 0.2 | https://github.com/nod-ai/sdxl-scripts/commit/037c5c12018eee3925ac8a925aacccf94220357d |
2 | Yes | conv_2d_nhwc_hwcf_2x32x32x1280x3x3x2560_i8xi8xi32 | 0.982422 | 0.3 | https://github.com/nod-ai/sdxl-scripts/commit/2de30d1cbdf10520f0d28bb60ddd8b8d201ba992 |
3 | Yes | conv_2d_nhwc_hwcf_2x64x64x1280x3x3x1280_i8xi8xi32 | 0.923829 | 0.2 | https://github.com/nod-ai/sdxl-scripts/commit/332aeffc66fb1cf52a3626ea00a3dac159e22363 |
4 | Unsuccessful | conv_2d_nhwc_hwcf_2x64x64x640x3x3x640_i8xi8xi32 | 0.781249 | ||
5 | Dupe | 0.771484 | |||
6 | Yes | conv_2d_nhwc_hwcf_2x128x128x320x3x3x640_i8xi8xi32 | 0.740235 | 0.1 | https://github.com/nod-ai/sdxl-scripts/commit/be1fa8793f80c754eb8e42a202cb0087908f8962 |
7 | Wrong pipeline | conv_2d_nhwc_hwcf_2x32x32x640x3x3x640_i8xi8xi32 | 0.699219 | ||
8 | Yes | conv_2d_nhwc_hwcf_2x128x128x640x3x3x640_i8xi8xi32 | 0.6875 | 0.3 | https://github.com/nod-ai/sdxl-scripts/commit/cd0fc0c78518543b901ea01dc98fe298abf3e165 |
9 | Yes | conv_2d_nhwc_hwcf_2x128x128x320x3x3x960_i8xi8xi32 | 0.570312 | 0.1 | https://github.com/nod-ai/sdxl-scripts/commit/b25a124d6139b1af8129d482efdf1d5f86f5508e |
10 | Yes | conv_2d_nhwc_hwcf_2x128x128x320x3x3x320_i8xi8xi32 | 0.5625 | 0.1 | https://github.com/nod-ai/sdxl-scripts/commit/b9623aa8030058eaab13d1cb746c1ae4b67073e7 |
11 | Unsuccessful | conv_2d_nhwc_hwcf_2x64x64x640x3x3x1920_i8xi8xi32 | 0.515625 | ||
12 | Dupe | 0.501953 | |||
13 | conv_2d_nhwc_hwcf_2x64x64x320x3x3x320_i8xi8xi32 | 0.382812 | |||
14 | conv_2d_nhwc_hwcf_2x32x32x1280x3x3x1920_i8xi8xi32 | 0.378906 | |||
15 | Dupe | 0.375 | |||
16 | conv_2d_nhwc_hwcf_2x64x64x640x3x3x1280_i8xi8xi32 | 0.335937 | |||
17 | Dupe | 0.259766 | |||
18 | conv_2d_nhwc_hwcf_2x64x64x640x3x3x960_i8xi8xi32 | 0.25 |
Cumulatively, measured 1.2ms speedup (51.8ms - 50.6ms) ==> 2.3%.
Newest trace
Newest trace as of https://github.com/nod-ai/sdxl-scripts/commit/0eb7ef0880285958ba8b29f8f886449932ec2190 (no horizontal fusion)
No | Done? | shape | ms | Improvement [ms] | Commit |
---|---|---|---|---|---|
1 | Yes | matmul_like_2x20x1024x64x1280_i8xi8xi32 | 6.551762 | 0.7 | https://github.com/nod-ai/sdxl-scripts/commit/5fd90152552e457e0ae0dd68e19d80e33b08d41f |
5 | Yes | matmul_like_2x20x64x64x2048_i8xi8xi32 | 2.011717 | 0.1 | https://github.com/nod-ai/sdxl-scripts/commit/dfe5c6e11a468337136c510044cf7027faeff2ce |
7 | Unsuccessful | matmul_like_2x10x4096x64x640_i8xi8xi32 | 0.870121 | ||
9 | Yes | matmul_like_2x10x64x64x2048_i8xi8xi32 | 0.344728 | 0.1 | https://github.com/nod-ai/sdxl-scripts/commit/811dcecbb59cbf1a9835a4a13bfa41812bef3c26 |
1: [m, n, m, n, k] --lhs-dims=bmk --rhs-dims=nnk --tile-dims='**mnk'
5: [m, n, m, n, k] --lhs-dims=bmk --rhs-dims=nnk --tile-dims='**mnk'
7: [m, n, m, n, k] --lhs-dims=bmk --rhs-dims=nnk --tile-dims='**mnk'
9: [m, n, m, n, k] --lhs-dims=bmk --rhs-dims=nnk --tile-dims='**mnk'
No | Done? | shape | ms | Improvement [ms] | Commit |
---|---|---|---|---|---|
3 | Yes | matmul_like_3x2x20x1024x64x1280_i8xi8xi32 | 3.302736 | 0.7 | https://github.com/nod-ai/sdxl-scripts/commit/676a9b93d4e304de577b00fbfd42d012d096ed69 |
5 | Same as no fusions | 1.580081 | |||
7 | Yes | matmul_like_2x2x20x64x64x2048_i8xi8xi32 | 0.958984 | 0.2 | https://github.com/nod-ai/sdxl-scripts/commit/963af2d6c72df956315f5512ed7cdcec7e353058 |
9 | Unsuccessful | matmul_like_3x2x10x4096x64x640_i8xi8xi32 | 0.458984 |
3: [n, m, n, m, n, k] --lhs-dims=bmk --rhs-dims=nnnk --tile-dims='**mnk'
5: [m, n, m, n, k] --lhs-dims=bmk --rhs-dims=nnk --tile-dims='**mnk'
7: [n, m, n, m, n, k] --lhs-dims=bmk --rhs-dims=nnnk --tile-dims='**mnk'
9: [n, m, n, m, n, k] --lhs-dims=bmk --rhs-dims=nnnk --tile-dims='**mnk'
I just double checked the total gain from tuning and it should be around 3.8 ms ==> 7.5%
Update 7/8: need to extend to support int8/int32 and batch mmt.