Open johanna-rock-tt opened 2 weeks ago
Describe the bug bcast op is not optimized
The bcast variant sharded bcast_h was optimized to reuse in0 New timings: block sharded (8x8) cores, H=256 W=9182 0.0225 ms block sharded (8x8) cores, H=2048 W=9182 0.0359 ms
sharded bcast_h
width sharded (8x8) cores, H=32 W=9182 0.005 ms width sharded (8x8) cores, H=2048 W=9182 0.025 ms
Other variants (sharded h / hw and interleaved variants) need to be revisited for similar optimizations.
bcast_h interleaved timings: interleaved (8x8) cores, H=256 W=9182 0.255 ms --> very slow for a bcast, interleaved bcast_h also not re-using in1.
Other possible optimizations to consider (e.g. for sharded bcast_h, but might also applicable for other variantes):
FYI: @TT-BrianLiu @shwetankTT @tt-aho @davorchap @uaydonat
Describe the bug bcast op is not optimized
The bcast variant
sharded bcast_h
was optimized to reuse in0 New timings: block sharded (8x8) cores, H=256 W=9182 0.0225 ms block sharded (8x8) cores, H=2048 W=9182 0.0359 mswidth sharded (8x8) cores, H=32 W=9182 0.005 ms width sharded (8x8) cores, H=2048 W=9182 0.025 ms
Other variants (sharded h / hw and interleaved variants) need to be revisited for similar optimizations.
bcast_h interleaved timings: interleaved (8x8) cores, H=256 W=9182 0.255 ms --> very slow for a bcast, interleaved bcast_h also not re-using in1.
Other possible optimizations to consider (e.g. for sharded bcast_h, but might also applicable for other variantes):