riscv / riscv-cheri

This repository contains the CHERI extension specification, adding hardware capabilities to RISC-V ISA to enable fine-grained memory protection and scalable compartmentalization.
https://jira.riscv.org/browse/RVG-148
Creative Commons Attribution 4.0 International
43 stars 26 forks source link

CMO exceptions don't look quite right #213

Closed Timmmm closed 3 months ago

Timmmm commented 4 months ago

The CMO/prefetch exceptions are:

ifdef::cbo_clean_flush[]
| Permission violation  | It does not grant <<w_perm>> and <<r_perm>>
| Length violation      | At least one byte accessed is within the bounds
endif::cbo_clean_flush[]

ifdef::cbo_inval[]
| Permission violation  | It does not grant <<w_perm>>, <<r_perm>> or <<asr_perm>>
| Length violation      | At least one byte accessed is outside the bounds
endif::[]

ifdef::prefetch_i[]
| Permission violation  | It does not grant <<x_perm>>
| Length violation      | At least one byte accessed is within the bounds
endif::[]

ifdef::prefetch_r[]
| Permission violation  | It does not grant <<r_perm>>
| Length violation      | At least one byte accessed is within the bounds
endif::[]

ifdef::prefetch_w[]
| Permission violation  | It does not grant <<w_perm>>
| Length violation      | At least one byte accessed is within the bounds
endif::[]
  1. The prefetch instructions shouldn't ever raise exceptions because they're just hints. The (non-cheri) spec says:

A cache-block prefetch instruction is permitted to access the specified cache block whenever a load instruction, store instruction, or instruction fetch is permitted to access the corresponding physical addresses. If access to the cache block is not permitted, a cache-block prefetch instruction does not raise any exceptions and shall not access any caches or memory

This makes sense because they're just or.i instructions semantically. In fact do they even need to be mentioned in this spec at all?

  1. Shouldn't cbo.clean and cbo.inval have the same Length violation? Pretty sure they should both be At least one byte accessed is outside the bounds.

  2. The CBO spec actually says you only need read OR write permission, not both:

A cache-block management instruction is permitted to access the specified cache block whenever a load instruction or store instruction is permitted to access the corresponding physical addresses. If neither a load instruction nor store instruction is permitted to access the physical addresses, but an instruction fetch is permitted to access the physical addresses, whether a cache-block management instruction is permitted to access the cache block is UNSPECIFIED. (yeay)

jrtc27 commented 4 months ago
  1. Yes because it matters for microarchitectural side channels
  2. No. Clean is safe, inval is not, because the latter throws away data in the cache line, acting sort of like a write to the whole line with whatever possibly stale data is in DRAM.
  3. Yes, and this is a bad idea. More sensible architectures like AArch64 require write permission, because if it’s not writable the data shouldn’t be changing.
Timmmm commented 4 months ago

Yes because it matters for microarchitectural side channels

Yes it should be mentioned, or yes it should raise an exception? Can you expand a bit? I get that this is a spectre danger zone, but surely it's wrong for ori x0, x0, 0 to raise an exception?

Clean is safe

Yeah but it's saying raise a length exception if at least one byte is within bounds. At the very least it should be None of the bytes accessed are in bounds shouldn't it?

Also I agree clean is safe but the CBO extension requires read or write access to the entire block. That does seem wrong but it also seems weird for CHERI to do something different.

Yes, and this is a bad idea.

Yeah though as you say only clean (and hence flush) are really "writes". So should cbo.inval require write permission? I can definitely think of situations where you are reading from read-only memory and you want to use cbo.inval no?

PeterRugg commented 4 months ago

@Timmmm I think this was already fixed? The current spec pages for the prefetch instructions say "the instruction does not perform a prefetch if...." rather than listing exception conditions. Is there a separate list of exceptions that's stale somewhere?

The decision to add the capability info to the instruction is to allow microarchitecture to protect itself from side channels. You're right though, that they mustn't trap, especially as a result of bounds.

You said that "clean, and hence flush" are writes: I see what you mean, but that's from the memory subsystem's point of view. From the program's point of view, an invalidate is absolutely a write: it writes the address to some value that address previously held. Clean and flush are not writes from that point of view: they don't change what the program is allowed to observe, only non-coherent things.

When reading from read-only memory, surely invalidate and flush are indistinguishable? The only difference is how dirty lines in the cache are treated, but if it's read only, then there can't be any?

Timmmm commented 4 months ago

I think this was already fixed?

Ah you're right. I only noticed them in cbo_exceptions.adoc - it looks like they were accidentally left there when the link to that file from the prefetch instructions was removed. I'll make a PR to fix it.

When reading from read-only memory

Sorry I wasn't clear - I mean suppose you have memory that is read-only from the current program's point of view (because it hasn't been given write access). But that memory is also being written to by a different non-coherent agent. You might still want to do cbo.invalidate on it right?

PeterRugg commented 4 months ago

If you haven't made a write to it, then you shouldn't care whether you invalidate or flush. It seems like software that requires an invalidate rather than a flush is always broken, because the cache can write out the line whenever it likes anyway.

Timmmm commented 4 months ago

Isn't the point of invalidating that some other agent may have written to it, so then their writes become visible to us? Or am I totally misunderstanding this?

An invalidate operation makes data from store operations performed by a set of non-coherent agents visible to the set of coherent agents at a point common to both sets by deallocating all copies of a cache block from the set of coherent caches up to that point

jrtc27 commented 4 months ago

Yes, but if you’re not writing to it, flush and invalidate are the same

Timmmm commented 4 months ago

I agree... Sorry maybe we're talking about different things or I'm just being really stupid. Here's what I mean:

image

  1. Hart 0 does a normal load of some block.
  2. Some other agent does an incoherent write to that block. It's incoherent so it doesn't automatically clear caches etc.
  3. The hart's cache is now stale but it doesn't know it.
  4. The hart does a cmo.inval on the block, clearing its cache.
  5. So when it does a read next it gets the value from the other agent instead of the stale 7.

The CMO spec isn't the clearest but I think that's the sort of thing it's supposed to enable right?

In this case Hart 0 only does reads. It might only have read permission from its CHERI capabilities. But it's still useful to be able to execute cbo.inval.

PeterRugg commented 4 months ago

For that example though, flush would do exactly the same as invalidate.

Timmmm commented 4 months ago

Yes I agree.... My point was that CHERI requires read and write permission for cbo.inval, but the CBO spec only requires read or write. And I think this is an example of where it would make sense to be able to execute cbo.inval with only read permission.

Timmmm commented 4 months ago

Unless I am reading this differently:

It does not grant W-permission, R-permission or ASR-permission

Is that saying if not (grants(read) or grants(write) or grants(asr)) or if (not grants(read)) or (not grants(write)) or (not grants(asr))?

Probably should be reworded to be not ambiguous anyway!

PeterRugg commented 4 months ago

So cheri needs read and write for inval, because of something like the following:

k = secret key;

// do some work

k = 0; // clear the key

untrused_func((const int *) &k); //pass read only pointer

If untrusted_func is allowed to do an invalidate, not only can it read the old value of secret key, which is bad enough (and why I think we said this needed ASR), but it can also violate the read-only because it's changed the value of k.

Your example points to where the code needs to run either cbo.flush or cbo.inval, but can't tell the difference. Since cbo.inval is so dangerous, why wouldn't we insist such code just has to do cbo.flush instead of cbo.inval?

jrtc27 commented 4 months ago

To make a broader statement:

It is always correct to use flush rather than invalidate. If your code relies on using invalidate, it is broken, because the line could be written back by the hardware at any point anyway. Moreover, flush is safe, because it’s an operation that the hardware can do at any point. Invalidate, however, is not. It exists solely as an optimisation for when you know you don’t need to write back. It is therefore completely reasonable to heavily restrict when you can use it, and to require potentially-dangerous scenarios to use flush instead.

Timmmm commented 4 months ago

Ah great example Peter. That makes sense, thank you both! Could be a great thing to add to an non-normative aside in the spec (I really like those in the main spec).

I think the final thing is that this is definitely incorrect:

| Length violation      | At least one byte accessed is within the bounds

Presumably it is meant to be the opposite?

| Length violation      | None of the bytes accessed are within the bounds
Timmmm commented 4 months ago

(for cbo.flush and cbo.clean)

PeterRugg commented 4 months ago

Yeah, it's one thing I noticed when I first ran through the spec: the conditions are a random mix of saying "exception when..." and "exception unless...". Could be good to do a pass making it consistent to avoid issues like this.

Timmmm commented 4 months ago

Cool I'll make a PR for that. I think most of them are "exception when" so I would view any that are "unless" as a bug.

arichardson commented 4 months ago

See also https://github.com/CTSRD-CHERI/cheri-specification/issues/65. My view is that INVAL is so dangerous it should require almighty permissions and bounds since it can time travel and bring back capability patterns that are outside the bounds of any reachable capability.

arichardson commented 4 months ago

However, I do agree that prefetch should not raise exceptions and instead just ignore the operation if the preconditions are not met. That was my understanding of https://github.com/CTSRD-CHERI/cheri-specification/issues/65 and I thought also specified here, but I must have missed that.

Timmmm commented 4 months ago

I thought also specified here, but I must have missed that.

Yeah it was but there was some dead text in a file that said otherwise that confused me. Removed in #214

andresag01 commented 4 months ago

@arichardson : What still needs to be resolved in this ticket? From your comment, I am guessing you re-opened it due to this?

See also https://github.com/CTSRD-CHERI/cheri-specification/issues/65. My view is that INVAL is so dangerous it should require almighty permissions and bounds since it can time travel and bring back capability patterns that are outside the bounds of any reachable capability.

I agree that invalidations are dangerous, but they have legitimate uses and are part of RISC-V. Therefore, I do not think removing them entirely from the ISA will be well-received. Requiring almighty is a bit strange too, for example, why would I need a capability with bounds over the entire address space if I would like to perform an operation over 1 cache line only?

My recollection of previous discussions is that we require a capability that has "sufficient" permissions to do the side-effect operations that an invalidate would incur. For example, an invalidate could "bring back" stale data that was not written back from the cache. If you squint, this could be reproduced using regular stores given that your authorising cap has W permissions and appropriate bounds.

We ended up requiring W, R and ASR permissions along with bounds covering the entire cache line -- ASR is because invalidate needs to read a CSR to work out how it behaves. IMHO, these need to be updated because invalidates could (e.g.) "bring back" (i.e. write) a capability, so my proposal is that invalidates require the following:

tariqkurd-repo commented 4 months ago

https://github.com/riscv/riscv-CMOs/issues/56

We've already asked for FLUSH to be a valid implementation of INVAL which is almost is in the CMO spec already but it's better to clarify it. Clearly the result of FLUSH is always correct, so doing this is a valid implementation.

We can add a recommendation that FLUSH is always used instead of INVAL for CHERI-RISC-V, or even strengthen it and say must be used instead and we won't violate the CMO spec as a result.

jrtc27 commented 4 months ago

C is potentially awkward because you may well not have that for device driver mappings.

PeterRugg commented 4 months ago

@andresag01

ASR is because invalidate needs to read a CSR to work out how it behaves.

Hmm, that wasn't quite my understanding: it was more because it's so dangerous. I was viewing this as a theoretical problem, but it probably is actually very practical, and could be a good experiment to do. If we didn't need ASR to do an invalidate, we could have some "compartment" that we model as being compromised just run an invalidate on any capability it ever gets given and see what capabilities it can find in the recovered memory: I wonder how often there will be an almighty cap lying around.

It's not hard to imagine a post-CHERI world where the goto thing for exploits to do when they compromise a compartment is to just run invalidates on every address they can find. At the very least, invalidates aren't giving code with ASR any more privilege than they already have since they can already access any address they like by setting up an evil translation mapping. However, even with ASR, invalidates may actually be an easier escalation route than setting up translation windows to the memory you want to read from.

tariqkurd-repo commented 4 months ago

presumably this can happen because a revocation sweep hasn't happened? If the revocation has happened then stale caps won't be around. Or... when closing down a compartment it should be cleaned. So it all sounds very avoidable if the stale caps are cleaned up correctly.

PeterRugg commented 4 months ago

It's true that revocation might get rid of them as a side effect, just from churning through the caches. It's not guaranteed though: the revocation sweep won't deliberately flush the caches.

jrtc27 commented 4 months ago

(and you wouldn't want revocation to flush the caches either, that sounds expensive...)

tariqkurd-repo commented 3 months ago

@Timmmm can we close this one now?

Timmmm commented 3 months ago

Yep. I think it might be nice to mention menvcfg[CBIE] but people can probably figure that out.