Will core 0 fence and then core 1 do fence.i make instruction fetch get the same instruction memory in a multicore system

fanghuaqi commented 1 month ago

Hello @aswaterman ,

I have a question in understanding these lines of the zfencei spec

https://github.com/riscv/riscv-isa-manual/blob/3c1d60298f16523aba30d45ba4d7c9381a4e2d4c/src/zifencei.adoc?plain=1#L69-L79

As spec described below,

To make a store to instruction memory visible to all RISC-V harts, the writing hart also
has to execute a data FENCE before requesting that all remote RISC-V
harts execute a FENCE.I.

below updated

If in multicore system eg. 4 core, core 0 load instruction from external memory and write data into its data-cache, and then do a FENCE in just core 0, will other cores such as core 1-3 (after they do a FENCE.i) see the same instruction in the same memory address as core 0 see? or in other word, will core 1-3 instruction fetch be able to see the data in core0's data-cache?

There is a dissussion in linux implemention about this, see

fanghuaqi commented 1 month ago

Here is a updated version:

If in multicore system eg. 4 core, core 0 load instruction from external memory and write data into its data-cache, and then do a FENCE in just core 0, will other cores such as core 1-3 (after they do a FENCE.i) see the same instruction in the same memory address as core 0 see? or in other word, will core 1-3 instruction fetch be able to see the data in core0's data-cache?

Actually, per our understanding to ISA-SPEC, we feel the answer is NO, i.e., the core 1-3 instruction fetch cannot see the data in core0's data-cache. The right sequence should be :

1, core 0 load instruction from external SD card memory and write data into its DCache
2, core 0 then do a FENCE.i to make sure its own ICache is synced with its DCache, and the core0's DCache will be flushed into main memory
3, core 0 then do a FENCE as barrier to make sure the preceding operation is visible to memory
4, core0 then ask for other core1-3 to do FENCE.i
5, and then core1-3 (after do FENCE.i) re-fetch the instruction from the main memory to get the latest instructions

Can you help to confirm that our understanding is correct?

Many Thanks

gfavor commented 1 month ago

On Wed, Jul 17, 2024 at 8:30 PM Huaqi Fang @.***> wrote:

Here is a updated version:

If in multicore system eg. 4 core, core 0 load instruction from external memory and write data into its data-cache, and then do a FENCE in just core 0, will other cores such as core 1-3 (after they do a FENCE.i) see the same instruction in the same memory address as core 0 see? or in other word, will core 1-3 instruction fetch be able to see the data in core0's data-cache?

The missing piece in the preceding (which corresponds to "the writing hart also has to execute a data FENCE before requesting that all remote RISC-V harts execute a FENCE.I") is to perform the memory access(es) after the FENCE that cause a request to be sent to all the remote harts.

All this of course presumes data cache coherency and hence no need for CBO's to explicitly push the written instruction out of core 0's data cache.

Actually, per our understanding to ISA-SPEC, we feel the answer is NO, i.e., the core 1-3 instruction fetch cannot see the data in core0's data-cache. The right sequence should be :

-

1, core 0 load instruction from external SD card memory and write data into its DCache

2, core 0 then do a FENCE.i to make sure its own ICache is synced with its DCache, and the core0's DCache will be flushed into main memory

The FENCE.I doesn't cause core 0's DCache to be flushed. It just causes its ICache and instruction fetch/etc. to become synchronized or consistent with the instruction written into its DCache.

3, core 0 then do a FENCE as barrier to make sure the preceding operation is visible to memory

As noted above the FENCE is to order the write of the instruction into core 0's DCache with the sending of a request (e.g. an IPI) to all remote harts. That ensures that the write is globally visible before any of the remote harts receive the request to do a FENCE.I (assuming all DCaches are coherent with each other as noted above).

4, core0 then ask for other core1-3 to do FENCE.i

5, and then core1-3 (after do FENCE.i) re-fetch the instruction from the main memory to get the latest instructions

Yes. The FENCE.I in a typical implementation flushes the fetch/decode pipeline of the core and, if the ICache is not coherent with all the DCaches in the system, also flushes the ICache. (Details will vary depending on the details of an implementation.)

Greg

-

Can you help to confirm that our understanding is correct?

Many Thanks

— Reply to this email directly, view it on GitHub https://github.com/riscv/riscv-isa-manual/issues/1544#issuecomment-2235244072, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALLX6GS56UNY7AHMFT6NJMDZM4ZGZAVCNFSM6AAAAABLBZBEP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMZVGI2DIMBXGI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

fanghuaqi commented 1 month ago

Hi @gfavor , thanks for your reply.

As the email list https://lore.kernel.org/lkml/032536BCDC0EB6C4+dc9fc383-d69c-4cb0-b66d-f4e32c29ab67@nucleisys.com/T/#md476f6dadc6bb3184699c24d936f94fa1c7a9722 described below, I just copied a piece of latest discussion quoted

Finally,

Riscv spec describe the fence.i instruction as following:

The FENCE.I instruction is used to synchronize the instruction and data streams.

RISC-V does not guarantee that stores to instruction memory will be made visible to instruction fetches on a RISC-V hart until that hart executes a FENCE.I instruction. A FENCE.I instruction ensures that a subsequent instruction fetch on a RISC-V hart will see any previous data stores already visible to the same RISC-V hart. FENCE.I does not ensure that other RISC-V harts' instruction fetches will observe the local hart’s stores in a multiprocessor system.

From this description, fence.i instruction only applies to local core,making instruction fetch can see any previous data stores on the same core.

Not on the same core, it is said: "A FENCE.I instruction ensures that a subsequent instruction fetch on a RISC-V hart will see any previous data stores already visible to the same RISC-V hart".

In other words, any store that is in the dcache of core0 should be seen by the instruction fetcher of any other core right? Since any core should be able to see what is in the other core's dcache right (ie the dcaches are coherent)? If your instruction fetcher on the other cores does not see the data, a simple memory barrier on core0 should make it visible, no need to flush the core0 dcache.

The commit[1] author(Alexandre Ghiti) thought If your instruction fetcher on the other cores does not see the data, a simple memory barrier on core0 should make it visible, no need to flush the core0 dcache., but we thought we need to do a fence.i to flush the core0 dcache not just a simple memory barrier, and then the other cores instruction fetcher then can see the data, could you help us to confirm which one is a correct understanding.

cc @palmer-dabbelt the commit co-author

[1] https://github.com/torvalds/linux/commit/01261e24cfab69c65043e1e61168348ae23a64c2

Thanks Huaqi

gfavor commented 1 month ago

The commit[1] author(Alexandre Ghiti) thought If your instruction fetcher on the other cores does not see the data, a simple memory barrier on core0 should make it visible, no need to flush the core0 dcache.

A memory barrier (aka FENCE) does not make specified prior memory accesses globally visible. It only ensures an order in which preceding and following memory accesses eventually become globally visible. And FENCE instructions do not cause flushing actions on caches.

but we thought we need to do a fence.i to flush the core0 dcache not just a

simple memory barrier,

FENCE.I does not cause flushing actions on DCaches. It only establishes consistency between the I side of a core and its DCache.

Greg

Message ID: @.***>

fanghuaqi commented 1 month ago

Hi Greg, thanks for your reply, we will continue to discuss with the patch author.

riscv / riscv-isa-manual

Will core 0 fence and then core 1 do fence.i make instruction fetch get the same instruction memory in a multicore system #1544

1, core 0 load instruction from external SD card memory and write data into its DCache