Replay buffers are currently under-utilized in the kernels. For the case of unpacker kernels, replay buffers can be used to update the l1 tile addresses without mmio accesses:
The following unpacker kernels do not use replay buffers:
[ ] llk_unpack_A.h
[ ] llk_unpack_AB.h
[ ] llk_unpack_reduce.h
[ ] llk_unpack_tilize.h
Performance measurements for the above kernels should be done, and operations that are unpack bound can try implementing the addresses updates for performance increase. Eltwise binary/unary operations for example are around ~15% math util in buda performance measurements.
Replay buffers are currently under-utilized in the kernels. For the case of unpacker kernels, replay buffers can be used to update the l1 tile addresses without mmio accesses:
or can also use CFGSHIFTMASK method here: #4
The following unpacker kernels do not use replay buffers:
Performance measurements for the above kernels should be done, and operations that are unpack bound can try implementing the addresses updates for performance increase. Eltwise binary/unary operations for example are around ~15% math util in buda performance measurements.
@ttmtrajkovic @rdjogoTT fyi