Open rtawfik01 opened 5 months ago
Blackhole has new CFGSHIFTMASK that can update addresses for the unpacker instructions inside the mop/replay buffers.
CFGSHIFTMASK
Instead of updating addresses in this method:
TTI_RDCFG(p_gpr_unpack::TMP0, THCON_SEC0_REG3_Base_address_ADDR32); TTI_ADDDMAREG(0, p_gpr_unpack::TMP0, p_gpr_unpack::TMP0, p_gpr_unpack::TILE_SIZE_A); TTI_STALLWAIT(p_stall::STALL_CFG, p_stall::THCON); TTI_WRCFG(p_gpr_unpack::TMP0,0,THCON_SEC0_REG3_Base_address_ADDR32);
Using the CFGSHIFTMASK instruction, it could be done like this:
TTI_CFGSHIFTMASK(1, 0b011, 32 - 1, 0, 0b11, THCON_SEC0_REG3_Base_address_ADDR32); // THCON_SEC0_REG3_Base_address_ADDR32 = THCON_SEC0_REG3_Base_address_ADDR32 + SCRATCH_SEC0_val_ADDR32
as long as the scratch buffer is correctly populated:
TTI_WRCFG(p_gpr_unpack::TILE_SIZE_A, 0, SCRATCH_SEC0_val_ADDR32); TTI_NOP;
If an operation is unpacker bound, then using the CFGSHIFTMASK should increase performance.
@ttmtrajkovic @rdjogoTT fyi.
Blackhole has new
CFGSHIFTMASK
that can update addresses for the unpacker instructions inside the mop/replay buffers.Instead of updating addresses in this method:
Using the CFGSHIFTMASK instruction, it could be done like this:
as long as the scratch buffer is correctly populated:
If an operation is unpacker bound, then using the
CFGSHIFTMASK
should increase performance.@ttmtrajkovic @rdjogoTT fyi.