riscv / riscv-CMOs

https://jira.riscv.org/browse/RVG-59
Creative Commons Attribution 4.0 International
77 stars 12 forks source link

Does CBO.ZERO.EA need to be atomic? #20

Closed dkruckemyer-ventana closed 3 years ago

dkruckemyer-ventana commented 3 years ago

I know that ZERO is not strictly a CBO, but....

Does CBO.ZERO.EA need to be atomic? The most practical concern is whether a T&E handler needs a lock or not, but another concern may the suitability of this instruction for RAS (e.g. a non-atomic version cannot suitably reset check bits on large codewords).

I think the atomicity requirement is a good thing, but reasonable people may disagree.

ingallsj commented 3 years ago

Even with a lock, could CBO.ZERO.EA be trap-and-emulated atomically by executing the existing store instructions? I don't think so, unless the trap handler also interrupts all of the other threads which could possibly be reading that address, which seems impractical.

If RAS needs this write to be full-ECC-granule-atomic, then I suggest waiting until the RAS extension to require that, and not us here and now in the CMO extension, even if most native implementations will do it atomically.

gfavor commented 3 years ago

I would like to say that CBO.ZERO.EA is "block atomic" just like the other CBO's. This (at least for non-T&E implementations) would be natural (again just like the other CBO's).

Although an alternative would be to say that it has absolutely no atomicity (other than byte atomicity). And of course an implementation can choose to implement it as block-atomic (say, for its own custom RAS purposes). And a standard RAS arch extension may come along and add that additional functional requirement onto the CBO.ZERO.EA instruction as part of standardizing that as a RAS tool in the toolbox.

But I'd encourage making ZERO block-atomic to begin with. And, for what I would like to think is the uncommon if not rare (and non-performant!) T&E implementation, let that M-mode do whatever it needs to do.

Greg

strikerdw commented 3 years ago

Opinions (probably even ones I'll agree with later on):

1) DCBZ can't reasonably be emulated in a single-copy-atomic fashion (i.e. in ways other than heroics like pulling all the threads in the "machine" into M-mode code to be sure no one is looking while the loo[p does the store of zeros).

2) DCBZs to cache inhibited space are another place where you have to do something (a loop of stores) and it's going to be bloody hard to make it single-copy-atomic (NB: "single-copy-atomic" means that the operation is completed in an atomic fashion in such a way that no other thread can see/feel the operation being fragmented).

3) What does "block atomic" mean? Is that just your words for my "single-copy-atomic" words?

4) Are we overthinking this a bit? DCBZ -- again -- is just a convenient peg some day to hang new ECC clearing semantics on if we want to. Not something that DCBZ has to do. Any circumstance where you can reasonable add this extra work onto DCBZ would see to already have the block atomicity or whatever it is we're after? If you're doing trap and emulate, seems like you have to find some other way to handle the atomic whacking of the cache line to clean up ECC? Says the guy who's never even seen or contemplated a machine that doesn't directly support DCBZ.

You're going to have a hard time in hte end making DCBZ atomic (whatever that means) in cache-inhibitied space. Though thankfully there shouldn't be caches with ECC errors to fix in that case either?

I know we never placed any restriction on the ISA nor did we mandate that DCBZ was the way to do this in ISA documents. In RAS documents we say DCBZ in our systems has the extra effect (since it as a convenient place to put it. That's all. Not sure why we need to force "block atomicity" on DCBZ?).

Derek

gfavor commented 3 years ago

On Mon, Dec 14, 2020 at 1:55 PM strikerdw notifications@github.com wrote:

1.

What does "block atomic" mean? Is that just your words for my "single-copy-atomic" words?

Yes. I was using informal terminology instead of those highfalutin fancy-pants words. :)

I know we never placed any restriction on the ISA nor did we mandate that DCBZ was the way to do this in ISA documents. In RAS documents we say DCBZ in our systems has the extra effect (since it as a convenient place to put it. That's all. Not sure why we need to force "block atomicity" on DCBZ?).

I'll switch over to the non-atomic camp. (All in the vein of the middle part of my last post.)

As a side note, ARMv8 says "no memory accesses from a DC ZVA have single-copy atomicity of any quantity greater than individual bytes."

Greg

dkruckemyer-ventana commented 3 years ago

Again, I'm not suggesting we architect ZERO around RAS requirements. I only intended to highlight one possible implication between certain ZERO behaviors and any future RAS requirements. Can we all agree that no one was suggesting that RAS requirements be injected into the CBO proposal? :)

And I'll disagree with myself (usually this doesn't happen in public....) and advocate for non-atomicity as well. Cleans up the T&E requirements and simplifies the memory ordering model.

I suppose there's still an argument to be had whether the single-copy-atomic requirement extends beyond bytes or not. For example, one might argue that single-copy atomicity is dependent on the base architecture, e.g. 8B for RV64, and 4B for RV32. Or something else. But individual bytes is certainly the least restrictive way to specify that behavior.

strikerdw commented 3 years ago

I would vote for least restrictive way to specify the behavior and go for bytes. We did that so the emulation handler for cache inhibited could use whatever size stores it wanted.

I'm happy we around putting the RAS directly into the CBO proposal, but I was confused by insisting on single-copy-atomicity seemingly in service of using this for RAS.

Re: single-copy-atomicity... sorry too many years arguing terms and reading hte old papers. It's a habit for me now :-).

Derek

dkruckemyer-ventana commented 3 years ago

RESOLVED: We will define CBO.ZERO.EA to be byte-atomic only. As guidance to software writers, we will add some non-normative usage notes on where this behavior may or may not be appropriate.