microsoft / cheriot-ibex

cheriot-ibex is a RTL implementation of CHERIoT ISA based on LowRISC's Ibex core.
Apache License 2.0
73 stars 14 forks source link

Exception priorities #11

Closed mndstrmr closed 4 months ago

mndstrmr commented 11 months ago

As pointed out to us before the exception priorities of CJAL, CJALR, memory instructions and instruction fetch errors are incorrect. I don't know if it is worth me going into too much detail here about each small issue here as there are many of them, most of which are best explained by simply comparing the SV and the Sail.

That said I am more happy to check any fixes as our verification setup does now check CSR writes (including for exceptions).

kliuMsft commented 11 months ago

Acknowledged - this is a known weakness in RTL and I semi-intentionally deferred it till we have a good way to verify. Since we now have the ability to check I will try to fix that in the next week and we can revisit.

kliuMsft commented 8 months ago

I have finally got the chance to clean things up on the exception priority side. Please see the latest commit (f3570e5). If we can start verifying RTL against sail it would be great, thanks.

mndstrmr commented 7 months ago

Started verifying the fixes for memory instructions. Looks largely OK, but I've run into a couple of issues:

  1. In vio_cause_enc the bound_vio check should be last instead of first.
  2. Misaligned capability loads should throw E_Fetch_Addr_Align exceptions, instead of EXC_CHERI, with priority lower even than bounds checks.
kliuMsft commented 7 months ago

Thanks - will check in the fix soon.

From: Louis-Emile Ploix @.> Sent: Saturday, January 20, 2024 9:20 AM To: microsoft/cheriot-ibex @.> Cc: Comment @.>; Subscribed @.> Subject: Re: [microsoft/cheriot-ibex] Exception priorities (Issue #11)

Started verifying the fixes for memory instructions. Looks largely OK, but I've run into a couple of issues:

  1. In vio_cause_enc the bound_vio check should be last instead of first.
  2. Misaligned capability loads should throw E_Fetch_Addr_Align exceptions, instead of EXC_CHERI, with priority lower even than bounds checks.

- Reply to this email directly, view it on GitHubhttps://github.com/microsoft/cheriot-ibex/issues/11#issuecomment-1902181245 or unsubscribehttps://github.com/notifications/unsubscribe-auth/A3V7IMBBSRTB6JLQNZYNVFTYPP4FBBFKMF2HI4TJMJ2XIZLTSOBKK5TBNR2WLJDUOJ2WLJDOMFWWLO3UNBZGKYLEL5YGC4TUNFRWS4DBNZ2F6YLDORUXM2LUPGBKK5TBNR2WLJDUOJ2WLJDOMFWWLLTXMF2GG2C7MFRXI2LWNF2HTAVFOZQWY5LFUVUXG43VMWSG4YLNMWVXI2DSMVQWIX3UPFYGLLDTOVRGUZLDORPXI6LQMWWES43TOVSUG33NNVSW45FGORXXA2LDOOJIFJDUPFYGLKTSMVYG643JORXXE6NFOZQWY5LFVE2TSMBRGM3DKOBZQKSHI6LQMWSWS43TOVS2K5TBNR2WLKRRHEYTGMRYGU3TSMFHORZGSZ3HMVZKMY3SMVQXIZI. You are receiving this email because you commented on the thread.

Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

kliuMsft commented 7 months ago

Checked in the fix (commit [5c37f9a]). Please verify, thanks.

mndstrmr commented 7 months ago

Fix looks good, there's just one more problem, then I can am happy to close this issue:

In #12 I suggested that the is_cap case needed top_equal during bounds checking. This does lead to correct code in the sense that illegal memory loads are prevented and legal ones aren't. What I had not realised however is that it does lead to incorrect exception priorities (my apologies about that - I think you alluded to this in your response). In particular, in the Sail a bounds check failing has higher priority than a misalignment error, i.e. bounds checking should happen on the potentially misaligned address, not the aligned address, so that in the event that both errors occur simultaneously we get the bounds check error instead of the alignment error.

From line 838 in cheri_ex.sv, the below fix would close this issue:

    if (cheri_operator_i[CIS_SUBSET])
      top_chkaddr = rf_fullcap_b.top33;
    else if (cheri_operator_i[CJAL] | cheri_operator_i[CJALR])
      top_chkaddr = {base_chkaddr[31:1], 1'b0};
    else if (is_cap)  // CLC/CSC
      top_chkaddr = base_chkaddr + 33'd8; // Don't assume alignment
    else 
      top_chkaddr = base_chkaddr;

    if (cheri_operator_i[CSEAL] | cheri_operator_i[CUNSEAL]) begin
      top_bound  = rf_fullcap_b.top33;
      base_bound = rf_fullcap_b.base32;
    end else if (cheri_operator_i[CJAL]) begin
      top_bound  = {pcc_cap_i.top33[32:1], 1'b0};
      base_bound = pcc_cap_i.base32;
    end else if (cheri_operator_i[CJALR]) begin
      top_bound  = {rf_fullcap_a.top33[32:1], 1'b0};
      base_bound = rf_fullcap_a.base32;
    end else begin // Includes CLC/CSC
      top_bound  = rf_fullcap_a.top33;
      base_bound = rf_fullcap_a.base32;
    end

    top_vio   = (top_chkaddr  > top_bound);
    base_vio  = (base_chkaddr < base_bound);
    top_equal = (top_chkaddr == top_bound);

    if (debug_mode_i)
      addr_bound_vio = 1'b0;
    else if (is_cap | cheri_operator_i[CIS_SUBSET])
      addr_bound_vio = top_vio | base_vio;
    else if (cheri_operator_i[CJAL] | cheri_operator_i[CJALR])
      addr_bound_vio = top_vio | base_vio | top_equal;
    else if (cheri_operator_i[CSEAL] | cheri_operator_i[CUNSEAL])
      addr_bound_vio = top_vio | base_vio | top_equal;
kliuMsft commented 7 months ago

@rmn30 , is there a particular reason the address bound violation has to take priority over alignment errors? This check is in the critical timing path for the load/store interface (since it has to gate the outgoing memory request), and I'd like to avoid additional adders..

davidchisnall commented 7 months ago

I would expect alignment errors to be higher priority because you may wish to trap and emulate on alignment failures.

kliuMsft commented 7 months ago

@mndstrmr , the new RTL commit (ad90fc7) added separate logic (addr_bound_vio_ext) to address mcause/mtval encoding for this specific case. Please let me know if it verifies. thanks.

mndstrmr commented 5 months ago

This looks good to me. There are a few more issues with exceptions that I have since found:

  1. When CSpecialRW receives an illegal register index it correctly throws an illegal instruction exception, but MTVAL is not set to the instruction which failed as in the Sail.
  2. If MTCC has base[8] = 1 and exponent 24 (a combination which is reachable by doing a CSetHigh, for instance) then when an exception is thrown MTCC is decompressed to a pcc_cap_t, and the top bit of the base is lost. This is fine, but if the PCC is later stored then the compressed bounds will have base[8] = 0 which is incorrect according to the Sail which always stores the compressed form. My fix is to add an additional 1 bit field to pcc_cap_t which stores what base[8] would be. This is written when doing a full2pcap and retrieved when doing a pcc2fullcap.
  3. is_cap_sentry (used by PVIO_SEAL for CJALR) operates on the compressed otype, but should operate on the decompressed otype.
  4. When a StoreCapImm runs, ls_addr_misaligned_only is set when perm_vio_vec[PVIO_ALIGN] && (cheri_err_cause == 0). If perm_vio[PVIO_SLC] is set (whether in debug mode or not) then the cheri_err_cause will be nonzero, and the exception will reported as a store local error instead of an alignment error, even in debug mode.

Thanks, and apologies for the issue dump.

kliuMsft commented 5 months ago

Thanks, all good findings.

  1. In my test case with a illegal scr address, I am actually seeing both sail and RTL reports mcause=0x2 and mtval=0x0. Do you see different behavior?
  2. This has to do with the RTL decision to use a 32-bit representation (base32) for base in the expanded format (full_cap_t or pcc_cap_t), which seemed reasonable since bases are always inclusive. I agree that base[8] = 1, exp=24 is possible per encoding, however I don't think we could ever get such a cap with tag=1. E.g., csethigh always clears the tag. @rmn30, please share your thoughts.
  3. This is definitely a bug - will fix in RTL
  4. Not quite sure I understand what's being described here. PVIO_SLC is intended to clear the tag of the cap written to memory, but does NOT cause an exception. Thus cheri_err_cause really is a don't care.
mndstrmr commented 5 months ago
  1. I see: There's a setting in the Sail (plat_mtval_has_illegal_inst_bits) which I was assuming was 1 because in your regular handling of illegal instructions (ibex_controller.sv line 735) you do set MTVAL to the instruction that was illegal. So I guess it's up to you which one you want, but they should be consistent with one another. It would be easier to set csr_mtval_o = 0 in ibex_controller.sv line 737.
  2. While yes the tag bit may or may not be set, a CGetHigh instruction will still see a difference between the Sail and the RTL.
  3. Sounds good
  4. An exception is raised anyway because perm_vio_vec[PVIO_ALIGN] is set, but then the exception type is not reported as an alignment exception because perm_vio[PVIO_SLC] is set.

I also have one more issue with CSpecialRW: If an illegal scr address is accessed while the PCC also doesn't have access_system_regs, currently the ASR exception is thrown instead of the illegal instruction exception. The Sail says it should be the other way around. The following fixes cheri_ex.sv line 1015:

if (cheri_operator_i[CCSR_RW] & cheri_wb_err_raw & illegal_scr_addr & cheri_exec_id_i)
    // cspecialrw trap, illegal addr, treated as illegal_insn
    cheri_wb_err_info_d = {3'h0, 1'b1, 12'h0};
else if (cheri_operator_i[CCSR_RW] & cheri_wb_err_raw & perm_vio & cheri_exec_id_i)
    // cspecialrw traps, PERM_SR
    cheri_wb_err_info_d = {5'h0, 1'b1, cheri_cs2_dec_i, cheri_err_cause};
kliuMsft commented 5 months ago

Regarding #1, I think in our case we actually have bool rv_mtval_has_illegal_inst_bits = false; In RTL (ibex_controller.sv), it's actually handled at line 810 (basically I chose to decode illegal_scr adddress in cheri_ex rather than decoder, in an attempt to reduce fan-in to illegal_insn which goes to too many places). cheri_wb_err_prio: begin if (cheri_pmode_i) begin if (cheri_wb_err_info_i[12]) begin // illegal SCR addr exc_cause_o = EXC_CAUSE_ILLEGAL_INSN; csr_mtval_o = {21'h0, cheri_wb_err_info_i[10:0]}; end else begin exc_cause_o = EXC_CAUSE_CHERI_FAULT; csr_mtval_o = {21'h0, cheri_wb_err_info_i[10:0]}; end end end So I think RTL is actually matching sail..

I will to take a look at #4 and #5 later, it make take a little while since I'll be out of office for a few days.

mndstrmr commented 5 months ago

mtval_has_illegal_inst_bits = 0 is fine, but in that case these lines also need to be updated to reflect that:

https://github.com/microsoft/cheriot-ibex/blob/2911d85a858500a28cfbdeaf5b17ec1926f8ee65/rtl/ibex_controller.sv#L735-L738

Both that and the code at line 810 refer to the same Sail handle_illegal function, meaning they both need to do the same thing.

If the code at line 810 is to remain the same, the code at line 735 should look like the below:

illegal_insn_prio: begin
    exc_cause_o = EXC_CAUSE_ILLEGAL_INSN;
    csr_mtval_o = 0;
end
kliuMsft commented 5 months ago

I see what you mean - yes we should make those 2 cases consistent. Likely will make line 737 to set mtval = 0 if pmode = 1. @rmn30, can you confirm we indeed what to go with mtval_has_illegal_inst_bits = 0?

kliuMsft commented 5 months ago

@mndstrmr I have checked in RTL changes (30cac29) which hopefully fixes issue #1, #3, #4 and #5. As for #2, I have confirmed with @rmn30 that the base[8]=1, exp=24 is a don't care case (not reachable). Can we change the constraints to reflect that? Thanks.

rmn30 commented 5 months ago

Yes, it should not be possible to create a tagged capability with base[8]=1 and exp=24 (i.e. base=2**32, which is greater than the 32-bit base used in csetbounds). If an untagged capability with such base is installed in MTCC when an exception / interrupt is taken this will result in an unrecoverable exception loop, so we don't really care about this case.

mndstrmr commented 4 months ago

This is all great, thanks. Easiest update I've made in a while.