Why FENCE.I instructions don't order the CBO management instructions?

henry-hsieh commented 1 year ago

Let's say that there is a system without coherency between I and D. Moreover, the D-cache using write-back policy. Give the following code example:

li a0, 1
li a1, 0x1000
sw x0, 0(a1)
fence.i
sw a0, 0(a1)
cbo.* 0(a1)
fence.i

If fence.i can't order the cbo.* instruction, cbo.* may be executed after fence.i. This may create following different cases (assume no other harts access the address):

cbo.inval before fence.i: The instruction fetch will see the MEM[0x1000] = 0, because the value 1 only exists on data cache and is being invalidated.
cbo.inval after fence.i: The instruction fetch will see the MEM[0x1000] = 0, because the value 1 only exists on data cache.
cbo.clean before fence.i: The instruction fetch will see the MEM[0x1000] = 1, because the value 1 is writing to shared memory.
cbo.clean after fence.i: The instruction fetch will see the MEM[0x1000] = 0 or 1, depends on whether cbo.clean is earlier than instruction fetch.
cbo.flush before fence.i: The instruction fetch will see the MEM[0x1000] = 1, because the value 1 is writing to shared memory.
cbo.flush after fence.i: The instruction fetch will see the MEM[0x1000] = 0 or 1, depends on whether cbo.flush is earlier than instruction fetch.

In the case 4 or case 6, the instruction fetch may not observe the self writing value.

gfavor commented 1 year ago

The definition of fence.i in the Unpriv spec addresses your question (specifically the last sentence below):

The FENCE.I instruction is used to synchronize the instruction and data streams. RISC-V does not guarantee that stores to instruction memory will be made visible to instruction fetches on a RISC-V hart until that hart executes a FENCE.I instruction. A FENCE.I instruction ensures that a subsequent instruction fetch on a RISC-V hart will see any previous data stores already visible to the same RISC-V hart.

In a RISC-V hart compliant with this RISC-V instruction (i.e. that implements the instruction as specified), the CBO is never necessary and the FENCE.I is always sufficient.

Conversely, what you are positing is a non-compliant RISC-V hart (wrt its fence.i implementation) - in which case all bets are off.

Note that the new (still in development) I/D Consistency architecture extension will provide instructions to do what you want in a system without coherency between I and D (and hence presumably doesn't implement fence.i properly).

henry-hsieh commented 1 year ago

Thanks! I'm missing that the second fence.i of my example should also flush the D-cache in the system without coherency between I and D. The cases should be updated to following.

cbo.inval before fence.i: The instruction fetch will see the MEM[0x1000] = 0, because the value 1 only exists on data cache and is being invalidated.
cbo.inval after fence.i: The instruction fetch will see the MEM[0x1000] = 1, because the value 1 is flushed out of data cache by fence.i.
cbo.clean before fence.i: The instruction fetch will see the MEM[0x1000] = 1, because the value 1 is written to shared memory by cbo.clean.
cbo.clean after fence.i: The instruction fetch will see the MEM[0x1000] = 1, because the value 1 is flushed out of data cache by fence.i.
cbo.flush before fence.i: The instruction fetch will see the MEM[0x1000] = 1, because the value 1 is written to shared memory by cbo.flush.
cbo.flush after fence.i: The instruction fetch will see the MEM[0x1000] = 1, because the value 1 is flushed out of data cache by fence.i.

The order between cbo.clean or cbo.flush and fence.i is irrelevant. However, the order between cbo.inval and fence.i will affect the view of instruction fetch. I'm aware of that the initial purpose of cbo.inval is to observe the data writes from non-coherent agents. Intentionally write a value then invalidate it is a little suspicious in software. Or it's simply that the CMO task group doesn't want to support CMO management instructions on instruction fetch (fence.i) and table entry update (sfence.vma) although the data may be altered by the CMO management instructions.

gfavor commented 1 year ago

There is a separate arch extension (under the J TG) working to address I/D consistency in both I/D coherent and non-coherent systems - which is why the CMO TG focused on the three extensions it developed.

As far as your other questions, note again that a compatible implementation of fence.i ensures that a subsequent instruction fetch on a RISC-V hart will see any previous data stores already visible to the same RISC-V hart. This implies cache operatiosn on any data and instruction caches, as necessary, to satisfy this required property.

And as noted above, fence.i is always sufficient and no form of CBO is ever needed.

But if you are asking what happens to be the ordering, if any, between fence.i and maintenance CBOs, then the answer is as spec'ed, i.e. no ordering. Which is not a problem for managing I/D consistency give all the above since fence.i already takes care of whatever is needed to ensure that subsequent ifetches see all preceding writes.

henry-hsieh commented 1 year ago

Thank you, I get your point! There is no need using CBO on I/D consistency. The fence.i could take care of everything.

riscv / riscv-CMOs

Why FENCE.I instructions don't order the CBO management instructions? #61