Open vmurali opened 9 months ago
This has been a reason for long discussions both. You will always need the bounds check in instruction fetch. Every instruction changes the PCC and thus can fall out of bounds, e.g., a conventional add
instruction executes and goes to the next PC. Thus we need the bounds check in instruction fetch.
In the base ISA, branches can raise instructions if the are misaligned. I have copied in the relevant paragraph of the unprivileged base ISA: "The conditional branch instructions will generate an instruction-address-misaligned exception if the target address is not aligned to a four-byte boundary and the branch condition evaluates to true. If the branch condition evaluates to false, the instruction-address-misaligned exception will not be raised."
RISC-V chose to build an architecture without some way to identify the previous failing instruction. If we now added some way to identify the previously failing instruction, we would also need to change that in the base ISA and provide that mechanism.
@francislaus is correct - this is an issue with the underlying architecture not with CHERI
This has been a reason for long discussions both. You will always need the bounds check in instruction fetch. Every instruction changes the PCC and thus can fall out of bounds, e.g., a conventional
add
instruction executes and goes to the next PC. Thus we need the bounds check in instruction fetch.I am not suggesting that we remove the check during instruction fetch. I am suggesting we remove it during instruction execute.
In the base ISA, branches can raise instructions if the are misaligned. I have copied in the relevant paragraph of the unprivileged base ISA: "The conditional branch instructions will generate an instruction-address-misaligned exception if the target address is not aligned to a four-byte boundary and the branch condition evaluates to true. If the branch condition evaluates to false, the instruction-address-misaligned exception will not be raised."
I am not suggesting we change the handling of misaligned exceptions. (And even the above paragraph is not true if we have compressed instructions. Quoting: "The alignment constraint for base ISA instructions is relaxed to a two-byte boundary when instruction extensions with 16-bit lengths or other odd multiples of 16-bit lengths are added (i.e., IALIGN=16).", but this point about misaligned exceptions is irrelevant to what I am suggesting).
RISC-V chose to build an architecture without some way to identify the previous failing instruction. If we now added some way to identify the previously failing instruction, we would also need to change that in the base ISA and provide that mechanism.
Base RISC-V does not have a capabilities bound check. It has a bounds check for page fault detection (which could be improved) but that again has nothing to do with capabilities bound check. With CHERI, we have an opportunity to do the correct thing, so I suggest we either add a new CSR or change the behavior of the new MEPCC
CSR (which is already different from the base MEPC
CSR).
Can you please reopen the issue till we have had some discussion before it is resolved? Or should I open a new issue with the same content, since I don't see a button to reopen this issue.
@tariqkurd-repo
If there was a way to determine the invoking instruction (e.g. via another CSR) would allow removing the check on jumps. However, without a way to determine this, you could create an OOB capability to another compartment and when invoked it would appear that the target compartment did somethign illegal rather than the caller.
I'll re-open this to continue discussion if we could use some other mechanism other than at-source bounds checks.
Thanks @arichardson.
I skimmed through base RISC-V unprivileged and privileged specs. Bounds check is brand-new behavior in CHERI. In RISC-V, one could potentially have a page fault on a 4-byte instruction that is the target of a taken branch/jump and which is straddling between pages; but the remedy for that would be to map the new virtual page to a physical page. So it's sufficient to check for page faults just during instruction fetch in base RISC-V. Whereas in CHERI, bounds check failure means it's illegal.
NB: The misaligned exception is triggered by the caller/invoker during its execution rather than the callee/invokee during instruction fetch.
However, without a way to determine this, you could create an OOB capability to another compartment and when invoked it would appear that the target compartment did somethign illegal rather than the caller.
Exactly! We need the source error handler to handle this.
@vmurali To make sure I understand. For this option:
Or, we can do this micro-architecturally, such that architecturally, in the case of a jump or a taken branch, MEPCC contains the previous instruction's address (and MEPCC contains the current excepting instruction's address otherwise). Either solution would eliminate the check for topBound being at least 2 bytes more than the address of a taken branch or jump and provides better debugging info.
This would leave the architecture unchanged, but eliminate the duplicate bounds check?
This would leave the architecture unchanged, but eliminate the duplicate bounds check?
It doesn't need a new CSR, but the semantics of MEPCC
changes. It does avoid the duplicate bounds check at execution time.
A concrete example when the behavior will be different with the proposed change is when there's a trampoline containing a 4-byte instruction, and the caller assumes it's a 2-byte instruction trampoline. The fault occurs during instruction fetch of the trampoline, but MEPCC
would point to the caller in the proposed change, whereas it would have pointed to the trampoline previously.
@vmurali Thanks for the clarification!
A microarchitectural aspect I forgot to mention; we don't usually require an "extra" bounds check in the pipeline, since capability execution already requires a bounds check (for CSetBounds, for example). Each capability instruction uses this bounds check at most once. The jumps and branches each have a chance to use this bounds check if they like, and with the current architecture, they like.
Yeah, you don't need another functional unit, but the inputs of that functional unit now has to deal with JALR/JAL also; and the output has to go into the priority chain for exceptions for JALR/JAL. There's still some cost to this routing and muxing. It would have been fine if this bought us something.
(I agree adding a new register (architecturally or microarchitecturally) will have some cost too.)
We need a representability check in J/B to avoid non-monotonicity when jumping outside of the representable region of a capability. There isn't a lot of difference between a representability check and a bounds check, so as long as both the fetch unit and jumps need to be capable of generating exceptions I don't feel strongly about this. The only way this could turn into a significant benefit is if exceptions were completely removed from jumps and branches; this would worsen debugging for jumps to invalid capabilities, but programmer expectations for being able to see the source of a jump to NULL without additional mechanisms like CTR are not high.
This is a valid point! We now have to do representability checks in Jumps/taken branches.
However, the higher level issue remains: we cannot determine if the target address contains a compressed instruction or not, and it is not okay to check if just 2 bytes are within bounds and fail during instruction fetch if the target instruction ends up being a 4-byte instruction. It's not very difficult to hit a problem with that approach: consider a 2-byte trampoline which is JIT-ed to inline the code the trampoline jumps to. Before JIT-ing, the jump would be pointing to the trampoline, but after JIT-ing, the jump's capability bounds must be updated to reflect the inlined code's bounds. If the exception occurs only in fetch of the target, we wouldn't be able to fix the bounds at the caller.
There are two ways to fix it:
MPrevPCC
or some such register) and raise a bounds violation. This eliminates exceptions in branches and JAL (but not in JALR - that requires other checks, viz. tag, seal, Exec permission).it is not okay to check if just 2 bytes are within bounds and fail during instruction fetch if the target instruction ends up being a 4-byte instruction
Sure it is. Don't give out those capabilities if you don't want people to use them.
It's not very difficult to hit a problem with that approach: consider a 2-byte trampoline which is JIT-ed to inline the code the trampoline jumps to. Before JIT-ing, the jump would be pointing to the trampoline, but after JIT-ing, the jump's capability bounds must be updated to reflect the inlined code's bounds.
That's a temporal safety issue. Revoke the capability before you reuse it for something else.
Sure it is. Don't give out those capabilities if you don't want people to use them.
The scenario I mentioned shows why it's not always the case that one has the right capabilities. Also, if one doesn't give out those bad capabilities, why bother doing a bounds check during fetch? We just have to ensure that we don't fall off the bounds when incrementing the PC and all jump/branch targets are to good capabilities.
That's a temporal safety issue. Revoke the capability before you reuse it for something else.
What I showed is the algorithm for revocation - the caller, through its exception handler, gets rid of the bad capability and constructs a new capability for the changed code.
MPrevPCC
I'm strongly opposed to adding new architectural state to debug control transfers when we have perfectly good E-Trace and CTR extensions that should compose with CHERI.
I'm strongly opposed to adding new architectural state to debug control transfers when we have perfectly good E-Trace and CTR extensions that should compose with CHERI.
We don't need an architectural state. It could be micro-architectural and MEPCC
will point to the prev PC in case of bounds violation on a J/B target instruction fetch.
MPrevPCC
I'm strongly opposed to adding new architectural state to debug control transfers when we have perfectly good E-Trace and CTR extensions that should compose with CHERI.
There is a need in a compartmentalised world for fault handlers to know what PCC is truly responsible for the fault (i.e. if you fabricate an untagged capability into another compartment, the OS needs to be able to determine it was you, not the compartment at that address, who should be blamed and killed), so in a world where jumps don't check their targets prior to changing PCC mprevpcc or whatever would provide such a mechanism, but that doesn't mean it's the right design.
so in a world where jumps don't check their targets prior to changing PCC
Just to re-iterate, the current spec does exactly that - a bad capability that passes the bounds violation (for + 2 bytes) before changing PCC but fails at target instruction fetch.
so in a world where jumps don't check their targets prior to changing PCC
Just to re-iterate, the current spec does exactly that - a bad capability that passes the bounds violation (for + 2 bytes) before changing PCC but fails at target instruction fetch.
But it's tagged, so the caller didn't fabricate it from nowhere.
But it's tagged, so the caller didn't fabricate it from nowhere.
It got it prior to JIT-ing in my example. (And the revocation mechanism is reliant on this exception being flagged at the caller.)
But it's tagged, so the caller didn't fabricate it from nowhere.
It got it prior to JIT-ing in my example. (And the revocation mechanism is reliant on this exception being flagged at the caller.)
Well don't do that then?
Well don't do that then?
If you are relying on never executing bad/buggy code, and/or not using straightforward algorithms for revocation, one might as well get rid of most of the checks in CHERI.
Well don't do that then?
If you are relying on never executing bad/buggy code, and/or not using straightforward algorithms for revocation, one might as well get rid of most of the checks in CHERI.
That's an extremely unhelpful reply, and clearly that's not what I'm saying. What I'm saying is that the algorithm you dreamed up to use in place of the one that the architecture has been designed to support does not work.
That's an extremely unhelpful reply, and clearly that's not what I'm saying
Sorry I didn't mean to be unhelpful or dismissive. I feel this conversation is going in circles.
My point is that performing the +2 byte bounds check is not sufficient in all cases (as I have shown), and not necessary in some cases (for instance, when the capability is provided from a trusted source, as you have stated). So that check has to be fixed in this spec.
in place of the one that the architecture has been designed to support does not work.
And what is the current spec supported revocation mechanism for JIT-ed code whose caps are no longer valid?
We can't "fix" this because there's nothing we can do that will satisfy everyone. No matter what we do, some sequence of operations will lead to a crash that does not point directly to a root cause. The first instruction isn't special; a sequence of instructions that fails due to a bounds issue after several instructions isn't meaningfully different from a bounds problem at +2 bytes. A crash stops the current sequence of instructions, which will generally be identified using sscratch or mhartid, not the program counter (which could be part of a shared library). If you're depending on what is fundamentally a debugging feature for memory allocation or process management, you probably have a deeper problem, do you want to elaborate on it?
And what is the current spec supported revocation mechanism for JIT-ed code whose caps are no longer valid?
Assuming that by "revocation" you mean "temporal safety" I don't see how any of the above helps with removing capabilities in memory.
If you're depending on what is fundamentally a debugging feature
It could be useful even during a gdb session to know the calling instruction in case of a crash on a jump target. HartID, cause etc are extremely useful, but so is knowing the sequence of instructions that led to the crash. I agree that a second jump loses all information about a prior jump, but at least having some history of instructions is better than nothing.
Assuming that by "revocation" you mean "temporal safety" I don't see how any of the above helps with removing capabilities in memory. memory allocation or process management, you probably have a deeper problem, do you want to elaborate on it?
The system I am interested is a "para-virtualized" collection of compartments i.e. each compartment has its own exception handler - exceptions don't usually result in crashes, and can even be used to implement certain features. Inter-compartment calls are usually via trampolines but dynamically, one could inline the actual code in place of the trampoline. The system-level trap handler calls the calling compartment's exception handler to fix an inter-compartment call. The calling compartment will then query the callee compartment for the new caps. Hope this gives you a sense of what I am talking about. Most of my issues (PTE, purecaps, no protection rings) are with such a system in my head.
The complexity in +2 byte check is because the check misses the fact that the trampoline has been changed, so the compartment's error handler can no longer execute.
Cool ideas! Maybe I'm naive in thinking that waiting for a trap and trying to fix up the faulting capability is not very general purpose, since capabilities are usually copied, so in an idiot's implementation, you would end up fixing up every time. To avoid this, you're going to need a structure where you store your capability to your trampoline in exactly one place and always load it from there. If that is the case, then cataloging all the places that need an update when the trampoline is changed isn't too onerous.
While we're tilting at straw men, here's another thought. If you are changing the size of a "trampoline" to in-line the function, it seems like it would be quite a special case that you could leave it in the same spot. It seems more likely that you would allocate the new in-lined function somewhere with a bit more space. In that case, you can put an "ecall" or something in the trampoline, and look at the link register to figure out where you've come from? On the other hand, if you're doing a protection-domain crossing, you don't have a reliable link register, but you're likely to have some kind of mechanism to know who to return to that should hopefully be trustworthy.
While we're tilting at straw men
I agree! The main point seems to get lost: target+2 bytes check is not sufficient to catch exceptions at the caller.
There are several ways to address it, whose pros and cons is what we should be discussing instead:
I am clearly not a fan of 4 (or 3, for that matter).
@vmurali unfortunately, you don't appear to have convinced anyone yet of:
target+2 bytes check is not sufficient to catch exceptions at the caller
In the case of distrusting compartments, the contents of the sealed entry point capability are the responsibility of the callee. The architecture is only meant to ensure that the sealed entry point doesn't change while in the "adversarial" domain. If a compartment changes use of its memory without somehow taking account of entry points it has previously distributed, it truly deserves any exceptions that it gets. If there is some system for lazily updating references to functions, it's going to need cooperation between compartments, and one can play around with who tells who about the update when, but I guess there isn't a fundamental error in the architecture for not accounting for changing the use of memory after having handed out pointers.
~As to whether it should be Target+2, Target+4, or just Target, I can't think of why we would prefer one over the other. It's true that an instruction should be at least Target+2, but I can't think of what you would lose if you just allowed the jump and faulted on instruction fetch if the bounds of the capability were insufficient for the fetch, even for a 16-bit instruction. For the sealed entry case, the compartment would have explicitly created and sealed a malformed entry point and deserves the trap. Maybe the "Target+2" check is a bit too presumptuous of the architecture...~
Edit: @gameboo has reminded me that the justification for Target+2 is that the unit of instruction fetch for RISC-V is explicitly 2-bytes. The instruction fetch state machine fetches 2 bytes at a time, checking their legality as it goes. This means, for example, that a page fault in the middle of a 4-byte instruction happens on the second 2-bytes. This is the justification of Target+2. Just like "clh" defines a 2-byte load, and should check that both bytes are in-bounds, "cjalr" defines a 2-byte load (which then may lead to another 2-byte load, according to the state machine of instruction fetch).
@jonwoodruff : then explain to me the justification of any bounds check during a JALR instruction? Your reasoning is that the sealed entry point has already been attested to be within bounds, so why check it again?
And what about JALs and Branches? Their targets are not attested by sealing. I would rather get the error in the caller when I debug wrong code than after the jump/branch has been taken.
Edit: @gameboo has reminded me that the justification for Target+2 is that the unit of instruction fetch for RISC-V is explicitly 2-bytes. The instruction fetch state machine fetches 2 bytes at a time, checking their legality as it goes. This means, for example, that a page fault in the middle of a 4-byte instruction happens on the second 2-bytes. This is the justification of Target+2. Just like "clh" defines a 2-byte load, and should check that both bytes are in-bounds, "cjalr" defines a 2-byte load (which then may lead to another 2-byte load, according to the state machine of instruction fetch).
To add to my earlier comment, this doesn't justify Target+2 check. As I point out earlier in this thread, a page fault is an actionable fault, not fatal, and it requires the faulting address precisely. Bounds check in CHERI can be fatal; trying to have a mechanism similar to page fault is not a justification for this check.
then explain to me the justification of any bounds check during a JALR instruction? Your reasoning is that the sealed entry point has already been attested to be within bounds, so why check it again?
I suppose you're asking about CJALR; in the current architecture (as I understand it - I need to figure out how to build it!), the bounds are not actually checked when a capability is sealed. This means that I could actually seal an out-of-bounds entry point (potentially pointing into another domain) and cause trouble later. I imagine that this case could also be resolved by clearing the tag on sealing an out-of-bounds capability, though I'm unsure what implications that might have for use with data capabilities. We do need to check the tag and executability and such before jumping to avoid blaming the caller for misuse of sealed capabilities, so we cannot completely avoid exceptions on jump-to-capability.
And what about JALs and Branches? Their targets are not attested by sealing. I would rather get the error in the caller when I debug wrong code than after the jump/branch has been taken.
If JALs and JALRs (64-bit address from register -> PCC.address) and Bs need a bounds check so that we get the error in the caller, even if we think we might get away without bounds-checking CJALR for sealed capabilities, I guess it's not a very consistent design.
To add to my earlier comment, this doesn't justify Target+2 check. As I point out earlier in this thread, a page fault is an actionable fault, not fatal, and it requires the faulting address precisely. Bounds check in CHERI can be fatal; trying to have a mechanism similar to page fault is not a justification for this check.
The goal isn't to be similar to a page fault, but to be consistent with the idea of instruction fetch in the ISA. If you are going to bounds-check a branch, do we choose to check the targeted byte? Or half-word? Or full-word? As I understand it, the RISC-V ISA defines instruction fetch as occurring in units of half-words (at least if you have "c"...), so a branch instruction is telling the CPU to fetch the half-word at the address; nothing more. What the CPU does from there depends on what it finds in that half-word. So a branch need only be responsible for ensuring that the half-word that it points at is accessible. Beyond that is the responsibility of the code that exists there.
I suppose you're asking about CJALR; in the current architecture (as I understand it - I need to figure out how to build it!), the bounds are not actually checked when a capability is sealed. This means that I could actually seal an out-of-bounds entry point (potentially pointing into another domain) and cause trouble later. I imagine that this case could also be resolved by clearing the tag on sealing an out-of-bounds capability, though I'm unsure what implications that might have for use with data capabilities. We do need to check the tag and executability and such before jumping to avoid blaming the caller for misuse of sealed capabilities, so we cannot completely avoid exceptions on jump-to-capability.
Yes, I meant CJALR (also CJAL and CBranch; none of the non-capability-enhanced instructions).
For CJALR, we need to check tag, executability and immediate = 0 independent of bounds violation; I don't think anyone's disputing those. It's only the bounds check that is being questioned.
If the bounds are violated when the capabilities are sealed, it means the callee made a mistake (which contradicts what you and jrtc27@ said earlier about sealed caps used in CJALR being correct by fiat). The mistake the callee made could be that Target+2 is within bounds, but the target is a 4-byte instruction. I suppose one could argue that in this case, the error handler at the callee should be handling it. But I would also like to know who the callee sent the wrong cap to, which is the caller.
If JALs and JALRs (64-bit address from register -> PCC.address) and Bs need a bounds check so that we get the error in the caller, even if we think we might get away without bounds-checking CJALR for sealed capabilities, I guess it's not a very consistent design.
For CJALs and CBranches, you won't get the error in the caller if Target+2 is within bounds, Target+4 is not within bounds, and the target is a 4-byte instruction. I am trying to fix exactly this scenario.
The goal isn't to be similar to a page fault, but to be consistent with the idea of instruction fetch in the ISA. If you are going to bounds-check a branch, do we choose to check the targeted byte? Or half-word? Or full-word? As I understand it, the RISC-V ISA defines instruction fetch as occurring in units of half-words (at least if you have "c"...), so a branch instruction is telling the CPU to fetch the half-word at the address; nothing more. What the CPU does from there depends on what it finds in that half-word. So a branch need only be responsible for ensuring that the half-word that it points at is accessible. Beyond that is the responsibility of the code that exists there.
This is a spec, so there's no "correct answer". In my opinion, it is very useful (at least during debugging) to make some CSR (MEPCC?) point to the jump or branch if a bounds exception occurs at the target. This is because, we can see what the target is from the jump or branch, but not the other way around (when the link register is not filled (branches never fill the link register); we may also not know which register is used as a link register since the ISA allows us to use a non-standard link register).
Checking bounds of capabilities at CJALR time instead of fetch time is fundamentally a low-cost, best-effort debugging feature, and it is adequate to that purpose.
Designing a revocation system based on bounds checks and self-modifying code is not an appropriate activity to carry out on a bug tracker, especially not on an issue about a specific instruction. Please take that somewhere else (I will read an email to the sig-cheri mailing list, as well as the CHERI Slack and #riscv IRC), and come back here if a consensus is reached that a new architectural mechanism is needed.
Can this be closed?
Designing a revocation system based on bounds checks and self-modifying code is not an appropriate activity to carry out on a bug tracker, especially not on an issue about a specific instruction. Please take that somewhere else (I will read an email to the sig-cheri mailing list, as well as the CHERI Slack and #riscv IRC), and come back here if a consensus is reached that a new architectural mechanism is needed.
Sorry, I am not asking for designing a revocation system based on bounds checks and self-modifying code (all that discussion is irrelevant to the issue I raised).
I am talking about enhancing the debugging by making sure that if (a) Target+2 is within bounds, (b) Target+4 is not within bounds, and (c) the target is a 4-byte instruction, the caller's PC, not the target's PC, is inserted into MEPCC.
Thank you for clearly stating the proposed change.
Please convince at least one person, somewhere other than here, that your proposed change is useful for any purpose, then return here with support and an edited motivation.
GitHub issues get unwieldy beyond about 10 comments and we've gone far past that just trying to establish the motivation.
The current scheme requires that the jumps (
CJALR
, etc) or taken branches do not throw an exception only if the capability'stopBound
is at leastredirected address + 2
. This not only repeats the bounds check performed during instruction fetch again during execution, but also does not provide with the caller address in case of an exception where thetopBound = redirected address + 2
, but the taken instruction is not compressed (because such an exception will happen only in the instruction fetch where the caller info is lost).It would be easy to add another CSR,
MEPrevPCC
that always stores the value of the previous instruction so that we know the address of the caller in case of atopBound
exception on a jump or taken branch. Or, we can do this micro-architecturally, such that architecturally, in the case of a jump or a taken branch,MEPCC
contains the previous instruction's address (andMEPCC
contains the current excepting instruction's address otherwise). Either solution would eliminate the check fortopBound
being at least 2 bytes more than the address of a taken branch or jump and provides better debugging info.