Closed ksco closed 2 months ago
Not sure what’s “reset”, and when we need to do reset?
Okay, I think you're referring to the insts[reset_n]
: if there is a "reset" instruction for current insts[ninst]
, we need to stop being smart and always generate a vsetvli
instruction, right?
Okay, I think you're referring to the
insts[reset_n]
: if there is a "reset" instruction for currentinsts[ninst]
, we need to stop being smart and always generate avsetvli
instruction, right?
Not really. I mean, if reset_n == -2
then yes, but for other value, it's different.
It's handled in void fpu_reset_cache(dynarec_rv64_t* dyn, int ninst, int reset_n)
from the rv64_dynarec_helper.c
file, and it's used in the blocks 85-105 in dynarec_native_pass.c
This is to handle non-linear execution flow...
I understand we need this for fpu cache management, but for sew (which is simpler), why do we need to care about reset? Can you give an example?
I understand we need this for fpu cache management, but for sew (which is simpler), why do we need to care about reset? Can you give an example?
It's a non-minear execution flow.... That means the intruction predecesser is NOT the previous opcode. See something like this:
xxxx JZ 1f
xxxx something
xxxx JMP 2b
1f: something else
in 1f, the flow is not linear, and the execution comes from the JZ 1f
, and so, all current state needs to be reset from there, and not from the previous opcode.
But we did the sew change at JZ 1f
, before the jump, in CacheTransform
. So when the execution flow reached 1f
, the SEW state was good.
But we did the sew change at
JZ 1f
, before the jump, inCacheTransform
. So when the execution flow reached1f
, the SEW state was good.
Not really. You need to think of it on the multiple passes. This is a look into the future... because 1f is a jump forward, so you need a previous pass to set the correct values.
You can just reset to nothing, no mater what, but it's a missed optimisation oportunity to do so, that the reset_n scheme is there to solve.
OK, so
reset_n == -2
, just gave up and reset it no matter what (I guess -2 is mapped to BARRIER_FULL
where the predecessors are too complex for optimizing?)reset_n >= 0
, do sew = insts[reset_n].sew
in pass0 and do sew = insts[ninst].sew
in other passes.reset == -2
is also used for CALLRET optimisation, to be sure the return opcode doesn't have any optimisation left.
For the other case, copy what the ARM64 backend is doing:
#if STEP > 1
// for STEP 2 & 3, just need to refrest with current, and undo the changes (push & swap)
dyn->n = dyn->insts[ninst].n;
dyn->ymm_zero = dyn->insts[ninst].ymm0_in;
neoncacheUnwind(&dyn->n);
#else
dyn->n = dyn->insts[reset_n].n;
dyn->ymm_zero = dyn->insts[reset_n].ymm0_out;
#endif
I'm unsure "reset" are handled correctly (so dyn->sew should be taken from dyn->insts[reset].sew)? Appart from that, it looks good.