tenstorrent / tt-llk-bh

Tenstorrent low-level tensix kernels for Blackhole
Apache License 2.0
3 stars 0 forks source link

Use CFGSHIFTMASK instruction #4

Open rtawfik01 opened 5 months ago

rtawfik01 commented 5 months ago

Blackhole has new CFGSHIFTMASK that can update addresses for the unpacker instructions inside the mop/replay buffers.

Instead of updating addresses in this method:

  TTI_RDCFG(p_gpr_unpack::TMP0, THCON_SEC0_REG3_Base_address_ADDR32);
  TTI_ADDDMAREG(0, p_gpr_unpack::TMP0, p_gpr_unpack::TMP0, p_gpr_unpack::TILE_SIZE_A);
  TTI_STALLWAIT(p_stall::STALL_CFG, p_stall::THCON);
  TTI_WRCFG(p_gpr_unpack::TMP0,0,THCON_SEC0_REG3_Base_address_ADDR32);

Using the CFGSHIFTMASK instruction, it could be done like this:

TTI_CFGSHIFTMASK(1, 0b011, 32 - 1, 0, 0b11, THCON_SEC0_REG3_Base_address_ADDR32); // THCON_SEC0_REG3_Base_address_ADDR32 =  THCON_SEC0_REG3_Base_address_ADDR32 +  SCRATCH_SEC0_val_ADDR32 

as long as the scratch buffer is correctly populated:

TTI_WRCFG(p_gpr_unpack::TILE_SIZE_A, 0, SCRATCH_SEC0_val_ADDR32);
TTI_NOP;

If an operation is unpacker bound, then using the CFGSHIFTMASK should increase performance.

@ttmtrajkovic @rdjogoTT fyi.