pulp-platform / ara

The PULP Ara is a 64-bit Vector Unit, compatible with the RISC-V Vector Extension Version 1.0, working as a coprocessor to CORE-V's CVA6 core
Other
346 stars 122 forks source link

Hazard with different LMUL #111

Open bonewp opened 2 years ago

bonewp commented 2 years ago

I found that only one single vd is marked in write_list in ara_sequencer. However, multiple vd will be written when LMUL > 1. Consider case as follows: vsetvli a0, x0, e32, m8 vadd v8, v16, v24 vsetvli x0, x0, e32, m1 vmul v4, v12, v16

v8~v15 are written by vadd, and the eight VRFs should be marked in write_list/global_hazard_list. Then vmul will stall until v12 is written back. But it seems not this way in ara_sequencer.

Could you help clarify this issue? Thanks in advance.

mp-17 commented 2 years ago

Hello @bonewp,

Thanks for the question! Thus far, Ara deals with this by stalling when LMUL is changed (WAIT_STATE in the ara_dispatcher) until all the previous operations are over. In this way, all the operations access a VRF that is always partitioned in the same way, without the risk of hazards.

Best, Matteo

AD738560581 commented 5 months ago

Hello @mp-17 , I found another problem in ara_sequencer. If there is existing an instruction of vmv8r, whcih will write v8~v15. However, there is a vmul instruction which will read v12. The eight VRFs should be marked in write_list/global_hazard_list. Then vmul will stall until v12 is written back.But it seems not this way in ara_sequencer. vsetvli a0, x0, e32, m1 vmv8r.v v8, v16 vmul v4, v12, v16

mp-17 commented 5 months ago

Hey @AD738560581,

Thanks a lot for reporting, I will include this in the next round of bug fixes.

Best, Matteo

AD738560581 commented 4 months ago

Hello @mp-17 , I found another problem in ara_sequencer again. If there is existing a series of instruction which need diferent cycles, like vadd and vrem. However, vrem and vsaddu need read v8 register while vaadd need write v8 register, so there is a hazard of WAR. But, vaadd instruction only find a hazard with vsaddu, while the vsaddu retired, then vaadd will remove hazard. However, vrem is long latency then vsaddu and vrem not read all v8 to v15, and vaadd write v8 to v15 before vrem, which will lead to a bug. config: e16, m8, vl=32 vrem.vx v24,v8,gp vsaddu.vx v16,v8,s11, v0.t sll a4, a1, 0xb add a4, a4, -1 vaadd.vx v8, v16, s3