riscv / riscv-CMOs

https://jira.riscv.org/browse/RVG-59
Creative Commons Attribution 4.0 International
81 stars 12 forks source link

SECOND QUESTION: cache index CMOs, e.g. (set,way) vs "microarchitecture index range" #10

Open AndyGlew opened 4 years ago

AndyGlew commented 4 years ago

Just like an earlier issue discusses address range CMOs vs per-cache-line CMOs... but this time for operations that are typically used for things like "flush the entire I$ or D$".

Such "cache microarchitecture dependent CMOs" have been done in some earlier processors a cache line at a time --- but this is less well established than for peer-cache-line-address-at-a-time. Quite a few RISC processors have "full cache flushes", etc.

First, if operating a cache line at a time, there must be a way of indicating which cache line is involved. Typically this is (set,way), but not all caches have sets and ways - indeed, it is not really clear what the set and ways are for something like a skewed associative cache.

But that's okay, we can abstract that as a "cache entry index number", which might be Set*Nways+Way for a traditional set associative cache, or whatever is appropriate.

Then, a per-cache-index loop typically looks like

FOR i from 0 to  #cache_entries-1 DO
     CMO.cache_index  i

or

FOR s from 0 to  Nsets-1 DO
FOR w from 0 to Nways-1 DO
     CMO.by_set_way  s,w

That's the traditional approaxch.

The draft proposal (by me, Andy Glew, TBD link here3) defines "microarchitecture range CMOs" that look like

        x1 := 0
loop:
        x1 := CMO.UR x1
        BNEZ x1, loop

which looks remarkably like the per-cache-index loop

except that, like in the CMO.AR proposal, the next cache index is returned by the CMO.UR instruction.

This allows severral implementations

(1) per (set,way) cache line at a time - traditional

(2) trap to M-mode efficiently, less overhead

(3) state machines that iterate over the entire cache, e.g. for EVICT, to write out dirty data

also (3.1) non-state machine impl;ementations, as in bulk invalidations that set all valid bits to 0 as a single operation.


I mark this as a SECONDARY QUESTION:

in the title, because I want it to be blaringly obvious

also becausde I am in a hurry, and will apply this issue tracker's priority scheme later

but mainly because I think there will be less discussion about this CMO.UR cache index range than there will be for the CMO.AR address range instruction.

since there are already quite a few implementations that are "full cache invalidations", and we want RISC-V to support such hardware when it is available.

--

again, this issue is not for the details of the CMO.UR. It is mostly for the idea of a midfroarchitwecure or cache index range.

brucehoult commented 4 years ago

Agreed. Iterating over the cache can sometimes be better than iterating over an address range. And this form provides flexibility in implementation.

Manufacturers of cores could if they wish document the encoding scheme from sets and ways or whatever they have into abstract indexes, thus allowing non-portable code to operate on a single way (or whatever).

ingallsj commented 4 years ago

I'm not a fan of including micro-architecture specific encodings or manufacturer-specific abstractions in the general-purpose ISA.

What is the use case, and what value would make it worthwhile for a manufacturer to make their micro-architecture-specific cache ops (set+way, if that's what they built) fit into an architecture-level abstraction?

ingallsj commented 4 years ago

Twist: I would be a fan of an "ALL" variant, instead of set/way/uarch-range.

billhuffman commented 4 years ago

If we're going to approach this, I see two issues that are at a conceptual level above instruction definition.