tenstorrent / tt-llk-bh

Tenstorrent low-level tensix kernels for Blackhole
Apache License 2.0
3 stars 0 forks source link

Review kernels for replay buffer usage #7

Open rtawfik01 opened 5 months ago

rtawfik01 commented 5 months ago

Replay buffers are currently under-utilized in the kernels. For the case of unpacker kernels, replay buffers can be used to update the l1 tile addresses without mmio accesses:

  TTI_RDCFG(p_gpr_unpack::TMP0, THCON_SEC0_REG3_Base_address_ADDR32);
  TTI_ADDDMAREG(0, p_gpr_unpack::TMP0, p_gpr_unpack::TMP0, p_gpr_unpack::TILE_SIZE_A);
  TTI_STALLWAIT(p_stall::STALL_CFG, p_stall::THCON);
  TTI_WRCFG(p_gpr_unpack::TMP0,0,THCON_SEC0_REG3_Base_address_ADDR32);

or can also use CFGSHIFTMASK method here: #4

The following unpacker kernels do not use replay buffers:

Performance measurements for the above kernels should be done, and operations that are unpack bound can try implementing the addresses updates for performance increase. Eltwise binary/unary operations for example are around ~15% math util in buda performance measurements.

@ttmtrajkovic @rdjogoTT fyi