Feedback on CMO Spec - Githubissues

SanjayPatelRVI commented 3 years ago

Hi, Providing some feedback on the markdown based specification maintained by David Kruckemyer. This is in regards to compatibility wrt our specification for CACHE and PREFETCH instructions as we'd eventually like to align the implementation for the 2022 Platform when software should be available.

Our specification provides Index Invalidate and Index Writeback Invalidate operations which could be useful for flushing entire caches. However, if there is no need for this in planned RISC-V Software, then we can abandon this.
DEMOTE.EA. Comment - Good that it is optional as in MIPS processors this was not implemented. Did not make sense to manipulate the LRU at such fine-grain.
Perhaps if PREFETCH can be prioritized over CMOs, then we can implement RISC-V PREFETCH in lieu of our specification. PREFETCH is user-level so use is likely to be more pervasive. I would suggest distinguishing PREFETCH from CACHE.
A CACHE FetchNLock will be very useful on L2 where the L2 is the shared cache for all the cores in the cluster.
CB0.ZERO - would this be limited to a particular cache level, maybe the first cache level shared among all cores of a cluster?

Sanjay

gfavor commented 3 years ago

A few quick answers ...

On Thu, Mar 11, 2021 at 12:27 PM SanjayPatelRVI @.***> wrote:

Our specification provides Index Invalidate and Index Writeback Invalidate operations which could be useful for flushing entire caches. However, if there is no need for this in planned RISC-V Software, then we can abandon this.

The TG does plan on addressing and standardizing indexed cache maintenance operations (just not in phase 1).

DEMOTE.EA. Comment - Good that it is optional as in MIPS processors this was not implemented. Did not make sense to manipulate the LRU at such fine-grain.

I'd have to check back in the latest draft, but I think DEMOTE is not actually "optional", and instead is required but an implementation is free to implement it as a NOP - since "demote" is a cache hint in any case.

Perhaps if PREFETCH can be prioritized over CMOs, then we can implement RISC-V PREFETCH in lieu of our specification. PREFETCH is user-level so use is likely to be more pervasive. I would suggest distinguishing PREFETCH from CACHE.

I don't quite see what you're getting at? The phase 1 draft spc includes cache PREFETCH instructions.

A CACHE FetchNLock will be very useful on L2 where the L2 is the shared cache for all the cores in the cluster.

Something to be added to the list of things for post-phase 1 consideration.

CB0.ZERO - would this be limited to a particular cache level, maybe the first cache level shared among all cores of a cluster?

Semantically think of this as a 64B write of all 0's, no different than an 8B write using an SD instruction with all 0's for the data (other than the larger size of the write). All the memory order model and coherency semantics apply equally to both. Hence this instruction is independent of the nature and organization of the cache hierarchy.

Greg

brucehoult commented 3 years ago

Welcome to github and the RISC-V world :-) I'm guessing you won't have to rejig your cores much at all -- even just a shrunk R16k would be very competitive right now.

SanjayPatelRVI commented 3 years ago

My comment in regards to PREFETCH i.e., Cache Block Hints, is that the definition could converge sooner to CACHE i.e., Cache Block Management instructions. They are simpler to define. If this is the case, perhaps PREFETCH can be prioritized and closed sooner in time for us to adapt in our ongoing implementation. If you think both will close at the same pace, then my point is moot. Either way, we can work with what we have for now.

CB0.ZERO – wrong interpretation on my part. Obviously a cache level will be specifiable in the opcode so that that takes care of my concern.

Thanks for the prompt response, Sanjay

From: gfavor @.> Reply-To: riscv/riscv-CMOs @.> Date: Thursday, March 11, 2021 at 1:51 PM To: riscv/riscv-CMOs @.> Cc: Sanjay Patel @.>, Author @.***> Subject: [EXTERNAL]Re: [riscv/riscv-CMOs] Feedback on CMO Spec (#22)

A few quick answers ...

On Thu, Mar 11, 2021 at 12:27 PM SanjayPatelRVI @.***> wrote:

Our specification provides Index Invalidate and Index Writeback Invalidate operations which could be useful for flushing entire caches. However, if there is no need for this in planned RISC-V Software, then we can abandon this.

The TG does plan on addressing and standardizing indexed cache maintenance operations (just not in phase 1).

DEMOTE.EA. Comment - Good that it is optional as in MIPS processors this was not implemented. Did not make sense to manipulate the LRU at such fine-grain.

I'd have to check back in the latest draft, but I think DEMOTE is not actually "optional", and instead is required but an implementation is free to implement it as a NOP - since "demote" is a cache hint in any case.

Perhaps if PREFETCH can be prioritized over CMOs, then we can implement RISC-V PREFETCH in lieu of our specification. PREFETCH is user-level so use is likely to be more pervasive. I would suggest distinguishing PREFETCH from CACHE.

I don't quite see what you're getting at? The phase 1 draft spc includes cache PREFETCH instructions.

A CACHE FetchNLock will be very useful on L2 where the L2 is the shared cache for all the cores in the cluster.

Something to be added to the list of things for post-phase 1 consideration.

CB0.ZERO - would this be limited to a particular cache level, maybe the first cache level shared among all cores of a cluster?

Semantically think of this as a 64B write of all 0's, no different than an 8B write using an SD instruction with all 0's for the data (other than the larger size of the write). All the memory order model and coherency semantics apply equally to both. Hence this instruction is independent of the nature and organization of the cache hierarchy.

Greg

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/riscv/riscv-CMOs/issues/22#issuecomment-797073848, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATGBV7PVJLWKHELDWJIE4LTTDE3PTANCNFSM4ZA7BQ2Q.

gfavor commented 3 years ago

On Thu, Mar 11, 2021 at 4:02 PM SanjayPatelRVI @.***> wrote:

CB0.ZERO – wrong interpretation on my part. Obviously a cache level will be specifiable in the opcode so that that takes care of my concern.

First, I should correct myself. The size of the write is not necessarily 64B. It is the "block size" (aka cache line size) that applies for all the CBO instructions. Which often times is 64B, but could be anything.

Then to the above comment about CBO.ZERO: Actually not. Just like normal store instructions don't specify a cache level, this special "store X bytes of all 0's" (where X = block size) doesn't specify a cache level either. In both cases it is a matter of implementation choices as to whether the write, for example, allocates into a given level or levels of the cache hierarchy.

Although this write case is interesting in that it's size will typically match the cache line size, which opens the door to allocation policy options (including applying a different policy to these writes versus to normal writes coming from stores).

Greg

SanjayPatelRVI commented 3 years ago

That works for us if the cache level to which CB0.ZERO allocates is implementation-dependent. It makes sense in our context to make it the cache prior to memory to avoid the memory latency.

In regards to specifying levels in opcode, I only see it specified for the Cache Block Management instructions. I would expect it to also be given for Prefetch. Work in progress I assume.

Sanjay

From: gfavor @.> Reply-To: riscv/riscv-CMOs @.> Date: Thursday, March 11, 2021 at 6:06 PM To: riscv/riscv-CMOs @.> Cc: Sanjay Patel @.>, Author @.***> Subject: [EXTERNAL]Re: [riscv/riscv-CMOs] Feedback on CMO Spec (#22)

On Thu, Mar 11, 2021 at 4:02 PM SanjayPatelRVI @.***> wrote:

CB0.ZERO – wrong interpretation on my part. Obviously a cache level will be specifiable in the opcode so that that takes care of my concern.

First, I should correct myself. The size of the write is not necessarily 64B. It is the "block size" (aka cache line size) that applies for all the CBO instructions. Which often times is 64B, but could be anything.

Then to the above comment about CBO.ZERO: Actually not. Just like normal store instructions don't specify a cache level, this special "store X bytes of all 0's" (where X = block size) doesn't specify a cache level either. In both cases it is a matter of implementation choices as to whether the write, for example, allocates into a given level or levels of the cache hierarchy.

Although this write case is interesting in that it's size will typically match the cache line size, which opens the door to allocation policy options (including applying a different policy to these writes versus to normal writes coming from stores).

Greg

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/riscv/riscv-CMOs/issues/22#issuecomment-797182423, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ATGBV7KCHBOOD5PHNQUYKMDTDFZLVANCNFSM4ZA7BQ2Q.

dkruckemyer-ventana commented 3 years ago

That works for us if the cache level to which CB0.ZERO allocates is implementation-dependent. It makes sense in our context to make it the cache prior to memory to avoid the memory latency. In regards to specifying levels in opcode, I only see it specified for the Cache Block Management instructions. I would expect it to also be given for Prefetch. Work in progress I assume. Sanjay

The specification does allow for levels to be specified for both CBO.ZERO and the various hint instructions. These are structural levels, e.g. L1, etc., as opposed to logical PoCs.

dkruckemyer-ventana commented 2 years ago

Closing since I think these issues have been addressed.

riscv / riscv-CMOs

Feedback on CMO Spec #22