SM trap handler gives "illegal instruction" on CVA6

jarkkojs commented 1 year ago

I've mentioning about the CVA6 freeze before but could not properly debug them because the driver was also acting sometimes weird. Now that I rewrote the Keystone driver I got further with this.

What is happening is that there is a storm of time interrupts and SDK ends up making resume ioctl's indefinitely. It just stays in the loop and returns all the time with SBI_ERR_SM_ENCLAVE_INTERRUPTED.

If I run the same fw_payload.bin in QEMU this does not occur and ./tests.ke runs perfectly to the end.

Any similar experiences with hardware and ideas what I should look at? It is Genesys 2 FPGA with CVA6.

Also I checked that it never gets to do edge call so nothing is dispatched and this happens early on when the enclave starts running.

jarkkojs commented 1 year ago

I'll try settingCONFIG_HZ_100=y (lowering the frequency from 250 Hz) and also CONFIG_SMP=n.

Being worked mostly with x86 and ACPI last few years devicetree stuff is a bit out of my territory so have a question to help with debugging. Where is the dts sourced for the CVA6 build? Which would be the value to except from /sys/firmware/devicetree/base/cpus/timebase-frequency?

@andreaskuster: Do you have any information to help out with this? Just in case you've ever encountered any similar issues while working on this.

CVA6 SDK does use 100 Hz frequency in its kernel config: https://github.com/openhwgroup/cva6-sdk/blob/master/configs/linux64_defconfig

jarkkojs commented 1 year ago

OK I added logging to the SM's trap handler. The enclave gets stuck because of illegal instruction, not because of timer interrupt. Any ideas how this could happen? I mean same fw_payload.bin works with RISC-V emulation of QEMU.

From initial dump of OpenSBI could spot that the 'h' extension is lacking from CVA6 but that is only obvious difference I could see.

jarkkojs commented 1 year ago

Could be also misconfigured PMP entry.

andreaskuster commented 1 year ago

Hi @jarkkojs

To answer your question: I cannot remember running into this issue, but it is also quite a while ago.

I looked into the linux64-cva6-defconfig parameters you mentioned and couldn't find the option CONFIG_HZ_100.

Can you try to follow these instructions and see if this solves the problem? Furthermore, make sure to delete the old cmake build directory, and re-run it using the flag -Dcva6=y, which should enable the usage of linux64-cva6-defconfig and riscv64_cva6_defconfig

Furthermore, for debugging PMPs, the best debugging experience I had so far was using QuestaSim. I used to run the CVA6 core in simulation, which can be a bit slow, but on the other hand, it gives you access to the internal signal state, which greatly helped me to identify problems for which I was unsure if they originated from software or hardware. Instructions for that can be found here: https://github.com/pulp-platform/cva6#getting-started

I hope that helps to resolve the problem. Otherwise, you might want to open up an issue on either cva6-sdk or cva6.

jarkkojs commented 1 year ago

Hi @jarkkojs

To answer your question: I cannot remember running into this issue, but it is also quite a while ago.

I looked into the linux64-cva6-defconfig parameters you mentioned and couldn't find the option CONFIG_HZ_100.

Can you try to follow these instructions and see if this solves the problem? Furthermore, make sure to delete the old cmake build directory, and re-run it using the flag -Dcva6=y, which should enable the usage of linux64-cva6-defconfig and riscv64_cva6_defconfig

Furthermore, for debugging PMPs, the best debugging experience I had so far was using QuestaSim. I used to run the CVA6 core in simulation, which can be a bit slow, but on the other hand, it gives you access to the internal signal state, which greatly helped me to identify problems for which I was unsure if they originated from software or hardware. Instructions for that can be found here: https://github.com/pulp-platform/cva6#getting-started

I hope that helps to resolve the problem. Otherwise, you might want to open up an issue on either cva6-sdk or cva6.

So.. the timer frequency is not the reason, I could verify that for some reason I get illegal instruction exception by injecting sbi_printf() to the SM's trap handler. I did not know about the simulator so that is what I'm going forward. Thank you for the pointers!

jarkkojs commented 1 year ago

I see in mideleg and medeleg settings very different values but do not have expertise to interpret them.

CVA6:

Boot HART ID              : 0
Boot HART Domain          : root
Boot HART Priv Version    : v1.10
Boot HART Base ISA        : rv64imafdc
Boot HART ISA Extensions  : none
Boot HART PMP Count       : 8
Boot HART PMP Granularity : 8
Boot HART PMP Address Bits: 54
Boot HART MHPM Count      : 0
Boot HART MIDELEG         : 0x0000000000000222
Boot HART MEDELEG         : 0x000000000000b109

QEMU:

Boot HART ID              : 0
Boot HART Domain          : root
Boot HART Priv Version    : v1.10
Boot HART Base ISA        : rv64imafdch
Boot HART ISA Extensions  : time
Boot HART PMP Count       : 16
Boot HART PMP Granularity : 4
Boot HART PMP Address Bits: 54
Boot HART MHPM Count      : 0
Boot HART MIDELEG         : 0x0000000000001666
Boot HART MEDELEG         : 0x0000000000f0b509

I'll pick the reference specs and see if I can interpret this. I neither have idea what "time" extensions is...

jarkkojs commented 1 year ago

Right so looking at sm/src, mideleg gets explicitly set by the SM but medeleg stays in its initial boot value. So I'm wondering does the value for medeleg look appropriate?

jarkkojs commented 1 year ago

@andreaskuster I see a difference in your original version that enclave.c did not have csr_write() for mideleg so I'll try to use enclave.[ch] from that version and see what happens.

jarkkojs commented 1 year ago

Actually I just copy the whole sm/src and switch OpenSBI to the exact same version as was used in the first working version with CVA6.

andreaskuster commented 1 year ago

This is specific to Keystone internals, which I haven't touched so far. @dayeol can you jump in?

jarkkojs commented 1 year ago

@andreaskuster Actually I'll swap the sm in my tree to the sm that was known good i.e. bb9128c.

jarkkojs commented 1 year ago

If I find a combination that I successfully execute ./tests.ke then it is matter of bisecting the failing commit...

jarkkojs commented 1 year ago

@andreaskuster I tried your repository and with that I can now confirm that there is nothing wrong in my hardware. It does run ./tests.ke. I was also able to confirm that I get the same issue in my fork with out-of-tree driver, so the root cause is neither my driver implementation.

So that leaves some change in the OpenSBI or SM, which has broken CVA6 after you contributed your CVA6 work. There's some changes on how CSRs are manipulated etc. so I have some seeking to do.

Thanks for all the help and keeping that repository online for this whole time. I was suspicious whether the FPGA has something wrong or how CVA6 was flashed into it but it is now known to be good. This will help me to find out root cause because I have something that is "known good" now.

jarkkojs commented 1 year ago

Did also double check that it is not my driver causing the issue. When I leave that out I still get illegal instruction with CVA6 so it is definitely a bug in the SM.

jarkkojs commented 1 year ago

@dayeol, @andreaskuster apparently through some refactorization there is whole block of functionality removed for CVA6: init_iopmp(). It simply does not exist at all. That explains a lot. So I'll figure out how to get it back to the current codebase and make a PR once stuff is working. So it is not a bug but sort of feature removal...

jarkkojs commented 1 year ago

/*
 * Copyright 2022 ETH Zurich and University of Bologna.
 * Copyright and related rights are licensed under the Solderpad Hardware
 * License, Version 0.51 (the "License"); you may not use this file except in
 * compliance with the License.  You may obtain a copy of the License at
 * http://solderpad.org/licenses/SHL-0.51. Unless required by applicable law
 * or agreed to in writing, software, hardware and materials distributed under
 * this License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
 * CONDITIONS OF ANY KIND, either express or implied. See the License for the
 * specific language governing permissions and limitations under the License.
 *
 * Author: Andreas Kuster <kustera@ethz.ch>
 * Description: IO-PMP setup and configuration library
 */

jarkkojs commented 1 year ago

@andreaskuster Possible to give some sort of overview what I'm looking at, as you're the author? :-)

jarkkojs commented 1 year ago

Also it is not obvious for me how to add new files to the SM build. E.g.

$ git grep "enclave\.c"
sm/tests/CMakeLists.txt:        test_enclave.c
sm/tests/test_enclave.c:#include "../src/enclave.c"

I see no build file...

jarkkojs commented 1 year ago

objects.mk OK let's try..

jarkkojs commented 1 year ago

My test diff:

diff --git a/sm/plat/generic/objects.mk b/sm/plat/generic/objects.mk
index 4739ea5..347a831 100644
--- a/sm/plat/generic/objects.mk
+++ b/sm/plat/generic/objects.mk
@@ -9,6 +9,7 @@ platform-objs-y += ../../src/attest.o
 platform-objs-y += ../../src/cpu.o
 platform-objs-y += ../../src/crypto.o
 platform-objs-y += ../../src/enclave.o
+platform-objs-y += ../../src/iopmp.o
 platform-objs-y += ../../src/pmp.o
 platform-objs-y += ../../src/sm.o
 platform-objs-y += ../../src/sm-sbi.o
diff --git a/sm/src/iopmp.c b/sm/src/iopmp.c
new file mode 100644
index 0000000..08cdd29
--- /dev/null
+++ b/sm/src/iopmp.c
@@ -0,0 +1,166 @@
+/*
+ * Copyright 2022 ETH Zurich and University of Bologna.
+ * Copyright and related rights are licensed under the Solderpad Hardware
+ * License, Version 0.51 (the "License"); you may not use this file except in
+ * compliance with the License.  You may obtain a copy of the License at
+ * http://solderpad.org/licenses/SHL-0.51. Unless required by applicable law
+ * or agreed to in writing, software, hardware and materials distributed under
+ * this License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+ * CONDITIONS OF ANY KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations under the License.
+ *
+ * Author: Andreas Kuster <kustera@ethz.ch>
+ * Description: IO-PMP setup and configuration library
+ */
+
+#include <stdint.h>
+
+#include <sbi/sbi_console.h>
+
+#include <sbi/riscv_encoding.h>
+#include "iopmp.h"
+
+#define IOPMP_BASE      0x50010000
+#define IOPMP_ADDR_BASE (IOPMP_BASE + 0x0)
+#define IOPMP_CFG_BASE  (IOPMP_BASE + 0x80)
+#define IOPMP_NUM_PMP   16
+
+int iopmp_granule = -1;
+
+void init_iopmp() {
+
+    // enforce granularity detection
+    if(iopmp_granule < 0){
+        detect_iopmp_granule();
+    }
+
+    /*
+     * Check IO-PMP address registers
+     */
+    for(int i = 0; i < IOPMP_NUM_PMP; i++){
+        // read/write base registers
+        volatile uintptr_t *pmpaddr = (volatile uintptr_t *) (IOPMP_ADDR_BASE + (i*0x8ULL));
+        // read old value: init -> set zero
+        uintptr_t old_pmpaddr = 0;
+        // set new value
+        *pmpaddr = 0xffffffffffffffffULL;
+        uintptr_t match = (0x3FFFFFFFFFFFFF & ~((iopmp_granule>>2)-1));
+
+        // check if read-back value matches
+        if (*pmpaddr != match) {
+            sbi_printf("IO-PMP%d: address read/write failed: %lx\n", i, *pmpaddr);
+        } else {
+            sbi_printf("IO-PMP%d: address read/write succeeded\n", i);
+        }
+        // restore previous value
+        *pmpaddr = old_pmpaddr;
+    }
+
+    /*
+     * Check IO-PMP config
+     */
+    // read/write base registers
+    volatile uintptr_t *pmpcfg0 = (volatile uintptr_t *) IOPMP_CFG_BASE;
+    volatile uintptr_t *pmpcfg1 = (volatile uintptr_t *) (IOPMP_CFG_BASE + 0x8ULL);
+
+    // read old value: init -> set zero
+    uintptr_t old_cfg0 = 0;
+    uintptr_t old_cfg1 = 0;
+    if(IOPMP_NUM_PMP > 8) {
+        old_cfg1 = 0;
+    }
+
+    // set new value
+    *pmpcfg0 = 0xffffffffffffffffULL;
+    if(IOPMP_NUM_PMP > 8) {
+        *pmpcfg1 = 0xffffffffffffffffULL;
+    }
+
+    // check if read-back value matches
+    if (*pmpcfg0 != 0xffffffffffffffffULL || (IOPMP_NUM_PMP > 8 && *pmpcfg1 != 0xffffffffffffffffULL)) {
+        sbi_printf("IO-PMP: cfg read/write failed\n");
+    } else {
+        sbi_printf("IO-PMP: cfg read/write succeeded\n");
+    }
+
+    // restore previous value
+    *pmpcfg0 = old_cfg0;
+    if(IOPMP_NUM_PMP > 8) {
+        *pmpcfg1 = old_cfg1;
+    }
+}
+
+void detect_iopmp_granule() {
+
+    // detect pmp granule size according to the RISC-V PMP specs
+    volatile uintptr_t *pmpcfg0 = (volatile uintptr_t *) IOPMP_CFG_BASE;
+    volatile uintptr_t *pmpaddr0 = (volatile uintptr_t *) IOPMP_ADDR_BASE;
+
+    //  write 0 to config & 0xff..ff to address
+    *pmpcfg0 = 0x0ULL;
+    *pmpaddr0 = 0xffffffffffffffffULL;
+
+    // read address and extract num_zeros on lsb
+    uintptr_t ret = *pmpaddr0;
+    int g = 2;
+    for (uintptr_t i = 1; i; i <<= 1) {
+        if ((ret & i) != 0)
+            break;
+        g++;
+    }
+    iopmp_granule = 1UL << g;
+
+    // print result
+    sbi_printf("IO-PMP granularity: %d\n", iopmp_granule);
+}
+
+iopmpcfg_t set_iopmp(iopmpcfg_t p) {
+
+    if(p.slot < 0 || p.slot >= IOPMP_NUM_PMP){
+        //sbi_printf("PMP invalid slot!\n");
+        return p;
+    }
+    // POST: 0 <= p.slot < 16
+
+    volatile uintptr_t *pmpcfg = (volatile uintptr_t *) (IOPMP_CFG_BASE + ((p.slot < 8) ? 0x0ULL : 0x8ULL));
+
+    uintptr_t mask, shift;
+    if(p.slot >= 8){
+        shift = p.slot-8;
+    } else {
+        shift = p.slot;
+    }
+    mask = (0xff << (8 * shift));
+
+    *pmpcfg = *pmpcfg & ~mask;
+
+    volatile uintptr_t *pmpaddr = (volatile uintptr_t *) (IOPMP_ADDR_BASE + (8 * p.slot));
+    *pmpaddr = p.a0;
+
+    *pmpcfg =  ((p.cfg << (8 * shift)) & mask) | (*pmpcfg & ~mask);
+
+    //sbi_printf("Set IO-PMP: slot: %lx, addr: %lx, cfg: %lx\n", p.slot, p.a0, p.cfg);
+
+    return p;
+}
+
+iopmpcfg_t set_iopmp_napot(uintptr_t base, uintptr_t range, uintptr_t slot) {
+    // set pmp with r/w/x access
+    return set_iopmp_napot_access(base, range, (PMP_W | PMP_R | PMP_X), slot);
+}
+
+iopmpcfg_t set_iopmp_napot_access(uintptr_t base, uintptr_t range, uintptr_t access, uintptr_t slot) {
+    iopmpcfg_t p;
+    p.cfg = access | (range > iopmp_granule ? PMP_A_NAPOT : PMP_A_NA4);
+    p.a0 = (base + (range / 2 - 1)) >> PMP_SHIFT;
+    p.slot = slot;
+    return set_iopmp(p);
+}
+
+iopmpcfg_t set_iopmp_allow_all(uintptr_t slot) {
+    iopmpcfg_t p;
+    p.cfg = (PMP_W | PMP_R | PMP_X) | PMP_A_NAPOT;
+    p.a0 = 0xFFFFFFFFFFFFFFFF;
+    p.slot = slot;
+    return set_iopmp(p);
+}
diff --git a/sm/src/iopmp.h b/sm/src/iopmp.h
new file mode 100644
index 0000000..47c392a
--- /dev/null
+++ b/sm/src/iopmp.h
@@ -0,0 +1,27 @@
+/*
+ * Copyright 2022 ETH Zurich and University of Bologna.
+ * Copyright and related rights are licensed under the Solderpad Hardware
+ * License, Version 0.51 (the "License"); you may not use this file except in
+ * compliance with the License.  You may obtain a copy of the License at
+ * http://solderpad.org/licenses/SHL-0.51. Unless required by applicable law
+ * or agreed to in writing, software, hardware and materials distributed under
+ * this License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR
+ * CONDITIONS OF ANY KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations under the License.
+ *
+ * Author: Andreas Kuster <kustera@ethz.ch>
+ * Description: IO-PMP setup and configuration library
+ */
+
+typedef struct {
+  uintptr_t cfg;
+  uintptr_t a0;
+  uintptr_t slot;
+} iopmpcfg_t;
+
+void init_iopmp();
+void detect_iopmp_granule();
+iopmpcfg_t set_iopmp(iopmpcfg_t p);
+iopmpcfg_t set_iopmp_napot(uintptr_t base, uintptr_t range, uintptr_t slot);
+iopmpcfg_t set_iopmp_napot_access(uintptr_t base, uintptr_t range, uintptr_t access, uintptr_t slot);
+iopmpcfg_t set_iopmp_allow_all(uintptr_t slot);
diff --git a/sm/src/pmp.c b/sm/src/pmp.c
index 8d9f195..33244b6 100644
--- a/sm/src/pmp.c
+++ b/sm/src/pmp.c
@@ -4,6 +4,7 @@
 //------------------------------------------------------------------------------
 #include "assert.h"
 #include "pmp.h"
+#include "iopmp.h"
 #include "cpu.h"
 #include "safe_math_util.h"
 #include "sm-sbi-opensbi.h"
@@ -216,6 +217,9 @@ int pmp_set_global(int region_idx, uint8_t perm)

 void pmp_init()
 {
+    /* initialize IO-PMP on CVA6 platform */
+    init_iopmp();
+
   uintptr_t pmpaddr = 0;
   uintptr_t pmpcfg = 0;
   int i;
diff --git a/sm/src/pmp.h b/sm/src/pmp.h
index f83aeb9..b681cfa 100644
--- a/sm/src/pmp.h
+++ b/sm/src/pmp.h
@@ -48,8 +48,14 @@ enum pmp_priority {
                 "sfence.vma\n\t"\
                 ".align 2\n\t" \
                 "1: csrw mtvec, t0 \n\t" \
-                : : "r" (addr), "r" (pmpc) : "t0"); \
-}
+                : : "r" (addr), "r" (pmpc) : "t0");                        \
+      /* Set IO-PMP */ \
+      iopmpcfg_t p; \
+      p.cfg = pmpc; \
+      p.a0 = addr; \
+      p.slot = n; \
+      set_iopmp(p); \
+      }

 #define PMP_UNSET(n, g) \
 { uintptr_t pmpc = csr_read(pmpcfg##g); \
@@ -61,7 +67,13 @@ enum pmp_priority {
                 "sfence.vma\n\t"\
                 ".align 2\n\t" \
                 "1: csrw mtvec, t0" \
-                : : "r" (0), "r" (pmpc) : "t0"); \
+                : : "r" (0), "r" (pmpc) : "t0");                \
+      /* Set IO-PMP */ \
+      iopmpcfg_t p; \
+      p.cfg = pmpc; \
+      p.a0 = 0; \
+      p.slot = n; \
+      set_iopmp(p);     \
 }

 #define PMP_ERROR(error, msg) {\
diff --git a/sm/tests/CMakeLists.txt b/sm/tests/CMakeLists.txt
index 8c237e4..f93e32e 100644
--- a/sm/tests/CMakeLists.txt
+++ b/sm/tests/CMakeLists.txt
@@ -102,6 +102,7 @@ add_executable(test_enclave
    ${SM_SRC}/crypto.c
    ${SM_SRC}/thread.c
    ${SM_SRC}/sm.c
+   ${SM_SRC}/iopmp.c
    ${MOCK_SOURCE_FILES}
    )
 target_link_libraries(test_enclave cmocka)

jarkkojs commented 1 year ago

So I got this now:

[SM] Initializing ... hart [0]
IO-PMP granularity: 8
IO-PMP0: address read/write failed: ca11ab1ebadcab1e
IO-PMP1: address read/write failed: ca11ab1ebadcab1e
IO-PMP2: address read/write failed: ca11ab1ebadcab1e
IO-PMP3: address read/write failed: ca11ab1ebadcab1e
IO-PMP4: address read/write failed: ca11ab1ebadcab1e
IO-PMP5: address read/write failed: ca11ab1ebadcab1e
IO-PMP6: address read/write failed: ca11ab1ebadcab1e
IO-PMP7: address read/write failed: ca11ab1ebadcab1e
IO-PMP8: address read/write failed: ca11ab1ebadcab1e
IO-PMP9: address read/write failed: ca11ab1ebadcab1e
IO-PMP10: address read/write failed: ca11ab1ebadcab1e
IO-PMP11: address read/write failed: ca11ab1ebadcab1e
IO-PMP12: address read/write failed: ca11ab1ebadcab1e
IO-PMP13: address read/write failed: ca11ab1ebadcab1e
IO-PMP14: address read/write failed: ca11ab1ebadcab1e
IO-PMP15: address read/write failed: ca11ab1ebadcab1e
IO-PMP: cfg read/write failed
[SM] Keystone security monitor has been initialized!

I try to downgrade OpenSBI to the same version as it was when CVA6 support was implemented (0d49c3b). I recall getting those errors also with the original version but tests still worked. I should have saved the transcript but perhaps cfg read/write succeeded. In QEMU I also get bunch of errors but the last line is success.

andreaskuster commented 1 year ago

Are you using code from my personal fork of the secure monitor? I strongly discourage doing so for your use case, as this version incorporates RISC-V AXI I/O-PMP modules (prototype, not standardized yet).

More details about the I/O-PMP extension can be found in the RISC-V mailing lists and my repo: https://github.com/andreaskuster/axi-io-pmp

I strongly suggest using CVA6 from openhwgroup/cva6 and Keystone from keystone-enclave/keystone. All the relevant components have been upstreamed and tested at the time of PR.

jarkkojs commented 1 year ago

@andreaskuster Not in our proj

Are you using code from my personal fork of the secure monitor? I strongly discourage doing so for your use case, as this version incorporates RISC-V AXI I/O-PMP modules (prototype, not standardized yet).

More details about the I/O-PMP extension can be found in the RISC-V mailing lists and my repo: https://github.com/andreaskuster/axi-io-pmp

I strongly suggest using CVA6 from openhwgroup/cva6 and Keystone from keystone-enclave/keystone. All the relevant components have been upstreamed and tested at the time of PR.

No I'm not using your personal fork, I'm using fairly recent version of the upstream Keystone but given the CVA6 issues, I tried a test if your original version boots. I found it from the very latest master: https://github.com/keystone-enclave/keystone/blob/master/CVA6_INSTALL.md

So, it still did work on my hardware and tests do pass.

And obviously it was a good idea to test that old version because I can legitimate proof that the hardware should be in a working condition :-) So I don't think my CVA6 needs reprogramming at this point based on this.

My intent is to fix more recent code, if the CVA6 support has degraded over time. Since your code is clearly working it is still legitimate comparison point (unfortunately, but it is how it is).

jarkkojs commented 1 year ago

@grg-haas For the latest upstream, with you instructions (BUILDROOT_TARGET=keystone-examples-dirclean make), I can make QEMU build but how in the new framework you do CVA6 build?

grg-haas commented 1 year ago

Hi @jarkkojs, since I don't have access to CVA6 hardware I was not able to port support for this to the new build system. There are configuration files in overlays/keystone/configs for CVA6 that should configure buildroot correctly (I've tried to port these across as well as possible), but I have never personally built the SM for CVA6.

andreaskuster commented 1 year ago

Thanks for pointing me to your instruction source. https://github.com/keystone-enclave/keystone/blob/master/CVA6_INSTALL.md was still referring to the old repo, of which the relevant changes had been unstreamed. I submitted PR #375 to rectify the instructions. By following them, you implicitly fetched my personal fork with the I/O-PMP included.

jarkkojs commented 1 year ago

Thanks for pointing me to your instruction source. https://github.com/keystone-enclave/keystone/blob/master/CVA6_INSTALL.md was still referring to the old repo, of which the relevant changes had been unstreamed. I submitted PR #375 to rectify the instructions. By following them, you implicitly fetched my personal fork with the I/O-PMP included.

OK, glad I could help! And thanks for still helping with me on this. I'm getting my range narrower in the SM code base.

jarkkojs commented 1 year ago

I traced the bug down to loadElf(). There is misaligned store over there. Apparently it is fixed already in https://github.com/keystone-enclave/elfloader/commit/0ab2e7034e023f60faef700e79e6a939e5f23b7f#

I still have to test that this will work out.

jarkkojs commented 1 year ago

@evgenyp67, @cathylu10: Hi, we are using a fork of dev-page-table and here is where the exception dump when running hello.ke on CVA6:

# ./hello.ke
Verifying archive integrity... All good.
Uncompressing Keystone Enclave Package
sbi_trap_error: hart0: misaligned store handler failed (error 0)
sbi_trap_error: hart0: mcause=0x0000000000000006 mtval=0x00000000a000081a
sbi_trap_error: hart0: mepc=0x0000000000000000 mstatus=0x0000000a00000900
sbi_trap_error: hart0: ra=0x0000000000000000 sp=0x00000000a000081a
sbi_trap_error: hart0: gp=0x0000000000000000 tp=0x0000000000000000
sbi_trap_error: hart0: s0=0x0000000000000000 s1=0x0000000000000000
sbi_trap_error: hart0: a0=0x0000000000000000 a1=0x00000000a0000000
sbi_trap_error: hart0: a2=0x0000000000100000 a3=0x00000000a0016000
sbi_trap_error: hart0: a4=0x00000000a003b000 a5=0x00000000a0055000
sbi_trap_error: hart0: a6=0x00000000a0100000 a7=0x0000000000040000
sbi_trap_error: hart0: s2=0x0000000000000000 s3=0x0000000000000000
sbi_trap_error: hart0: s4=0x0000000000000000 s5=0x0000000000000000
sbi_trap_error: hart0: s6=0x0000000000000000 s7=0x0000000000000000
sbi_trap_error: hart0: s8=0x0000000000000000 s9=0x0000000000000000
sbi_trap_error: hart0: s10=0x0000000000000000 s11=0x0000000000000000
sbi_trap_error: hart0: t0=0x0000000000000000 t1=0x0000000000000000
sbi_trap_error: hart0: t2=0x0000000000000000 t3=0x0000000000000000
sbi_trap_error: hart0: t4=0x0000000000000000 t5=0x0000000000000000
sbi_trap_error: hart0: t6=0x0000000000000000
[host] enclave returned: 0

Note that I injected the printout there (i.e. handler for this exception does work):

        case CAUSE_MISALIGNED_STORE:
                rc  = sbi_misaligned_store_handler(mtval, mtval2, mtinst, regs);
                msg = "misaligned store handler failed";
+               if (!rc)
+                       sbi_trap_error(msg, rc, mcause, mtval, mtval2, mtinst, regs);

So looking at mtval we can see that:

$ riscv64-unknown-linux-gnu-addr2line -s -e ./build/examples/hello/loader/src/loader-hello-loader/obj/loader.S.o 0x81a
loader.S:74

This points to:

73  csrw satp, a0 // switch to virtual addresssing 
74  sfence.vma

Any ideas what I should look into with this bug? Are we missing some essential fix? Loader is at 0ab2e7034e023f60faef700e79e6a939e5f23b7f. SM does not have much other differences with dev-evgeny-dll-clean than the attestation code. Maybe SDK is calculating something wrong?

Yeah, and all works in QEMU.

evgenyp67 commented 1 year ago

csrw is a register-register operation so this is very strange. Maybe fence is causing some store to catch up?

jarkkojs commented 1 year ago

csrw is a register-register operation so this is very strange. Maybe fence is causing some store to catch up?

Without injected sbi_trap_error() call, and similar added to the "illegal instruction" exception it will end in that, not in this alignment:

sbi_trap_error: hart0: illegal instruction handler failed (error 0)
sbi_trap_error: hart0: mcause=0x0000000000000002 mtval=0x0000000000000000
sbi_trap_error: hart0: mepc=0x0000000000000000 mstatus=0x0000000a00000900
sbi_trap_error: hart0: ra=0x0000000000000000 sp=0x00000000a000081a
sbi_trap_error: hart0: gp=0x0000000000000000 tp=0x0000000000000000
sbi_trap_error: hart0: s0=0x0000000000000000 s1=0x0000000000000000
sbi_trap_error: hart0: a0=0x0000000000000000 a1=0x00000000a0000000
sbi_trap_error: hart0: a2=0x0000000000100000 a3=0x00000000a0016000
sbi_trap_error: hart0: a4=0x00000000a003b000 a5=0x00000000a0055000
sbi_trap_error: hart0: a6=0x00000000a0100000 a7=0x0000000000040000
sbi_trap_error: hart0: s2=0x0000000000000000 s3=0x0000000000000000
sbi_trap_error: hart0: s4=0x0000000000000000 s5=0x0000000000000000
sbi_trap_error: hart0: s6=0x0000000000000000 s7=0x0000000000000000
sbi_trap_error: hart0: s8=0x0000000000000000 s9=0x0000000000000000
sbi_trap_error: hart0: s10=0x0000000000000000 s11=0x0000000000000000
sbi_trap_error: hart0: t0=0x0000000000000000 t1=0x0000000000000000
sbi_trap_error: hart0: t2=0x0000000000000000 t3=0x0000000000000000
sbi_trap_error: hart0: t4=0x0000000000000000 t5=0x0000000000000000
sbi_trap_error: hart0: t6=0x0000000000000000

I.e. if I do not inject it there, the code will continue but here it will end with or without sbi_trap_error(). So this might be the actual error but the former helped to get something legit to mtval.

I've tested my CVA6 with the original contributed code so I know that the HW is good enough to run Keystone. So it must be something to do with the dev-page-table strategy of setting satp initially to zero. I don't see anything obviously wrong in it but there is something weird..

jarkkojs commented 1 year ago

I'll check if I'm able to compile CVA6 emulator for myself: https://github.com/openhwgroup/cva6.

That should be better environment to debugging this. Never tried it so not sure if I'm successful but can't be that hard...

RT @acaldaya

jarkkojs commented 1 year ago

Direct quote from "RISC-V Volume 2: Privileged", section 3.1.0:

"The TVM (Trap Virtual Memory) bit supports intercepting supervisor virtual-memory management operations. When TVM=1, attempts to read or write the satp CSR or execute the SFENCE.VMA instruction while executing in S-mode will raise an illegal instruction exception. When TVM=0, these operations are permitted in S-mode. TVM is hard-wired to 0 when S-mode is not supported."

The symptom sounds the same but I'm not yet sure about the conditions. Worth of checking what TVM gets as default.

Also mideleg and medeleg have different default values between QEMU and CVA6. Could this have any effect?

jarkkojs commented 1 year ago

csrw is a register-register operation so this is very strange. Maybe fence is causing some store to catch up?

There comes first misaligned store which is followed by illegal instruction (more info in above comments).

Looking at offset 0x81a it is misaligned I think because you have two byte reminder when you divide it by eight.

I wonder if . = ALIGN(8); in loader.lds would help. Or even . = ALIGN(4096); might be clear. Or even putting align statement to the .S file. Which would you prefer if this is successful in the first place?

Maybe I leave first linker script alone and do the adjustments in the assembly file.

It can easily be given different toolchains, environments and other shenanigans that by luck it often ends up being aligned to XLEN perhaps. This does sometimes happen IRL...

jarkkojs commented 1 year ago

I mean I've encountered such issues with production kernels so nothing extraordinary if this was the issue. Just took the full month to reach, first debugging driver, then SM, learning RISC-V ISA from zero in the middle and finally locating the failure point from the depths of keystone, phew... Good crash course to this ISA tho...

jarkkojs commented 1 year ago

I've reached this failure in the same manner both with the stock Keystone driver and my own alternative in-kernel driver https://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-riscv-keystone.git/tree/arch/riscv/kernel/keystone.c?h=keystone-5.19.y. My driver is a good reference point in the sense it excludes possibility something going wrong with CMA, which was one past issue with CVA6. CMA pool is allocated during the system boot exactly once for Keytone.

jarkkojs commented 1 year ago

@cathylu10 can you provide context to this commit:

commit d529a9489f6f9f994264d54579e6d35101fff9a9
Author: Catherine Lu <cathylu@berkeley.edu>
Date:   Tue Apr 5 19:52:23 2022 +0000

    Debugging

diff --git a/loader.S b/loader.S
index 265f484..467489b 100644
--- a/loader.S
+++ b/loader.S
@@ -45,7 +45,7 @@ _start:
   call load_runtime

   // exit if errors
-  blt a0, x0, exit
+  # blt a0, x0, exit

   // switch to va and jump to runtime code
   li t0, RUNTIME_ADDRESS
@@ -65,12 +65,7 @@ _start:

   sfence.vma
   csrw satp, a0 // switch to virtual addresssing
-
-
-exit: // exit enclave
-  li a7, SBI_EXT_EXPERIMENTAL_KEYSTONE_ENCLAVE
-  li a6, SBI_SM_EXIT_ENCLAVE
-  ecall
+  sfence.vma

 _sstack:
   .dword 1

I don't understand the reasoning for the change but where the misaligned store exception triggers is the second sfence.vma.

jarkkojs commented 1 year ago

In 4.1.0 of the privileged spec it is stated that "If the new address space’s page tables have been modified, or if an ASID is reused, it may be necessary to execute an SFENCE.VMA instruction (see Section 4.2.1) after, or in some cases before, writing satp". Which one is it in this case? Or is there some logic having it in both sides?

jarkkojs commented 1 year ago

Here the flush is done differently: https://github.com/keystone-enclave/keystone/blob/dev-evgeny-dll-clean/runtime/util/rt_util.c. fence.i is followed-by sfence.vma. I'm pretty confused with these various versions of synchronization.

Following that pattern, it should be:

diff --git a/loader.S b/loader.S
index 5a78f4d..4de45c1 100644
--- a/loader.S
+++ b/loader.S
@@ -65,8 +65,8 @@ _start:
   LOAD a6, 5*REGBYTES(sp)
   LOAD a7, 6*REGBYTES(sp)

-  sfence.vma
   csrw satp, a0 // switch to virtual addresssing
+  fence.i
   sfence.vma

Looking at head.S in the kernel, I try this first though:

   sfence.vma
   csrw satp, a0 // switch to virtual addresssing
-  sfence.vma
-
+.align 2

See https://elixir.bootlin.com/linux/latest/source/arch/riscv/kernel/head.S#L105

jarkkojs commented 1 year ago

https://github.com/riscv/riscv-isa-manual/issues/226

jarkkojs commented 1 year ago

This is what I'm testing now:

--- a/loader.S
+++ b/loader.S
@@ -69,8 +69,8 @@ _start:
   LOAD a6, 5*REGBYTES(sp)
   LOAD a7, 6*REGBYTES(sp)

-  sfence.vma
   csrw satp, a0 // switch to virtual addresssing
+  fence.i
   sfence.vma

Doing sfence.vma after is I guess more by the book. Linux kernel does differently because it re-uses the same ASID. I'm hoping that inserted fence.i would remove the aligmemnt exception. The model is taking from tlb_flush() helper.

jarkkojs commented 1 year ago

Running through Spike for comparison.

jarkkojs commented 1 year ago

Not an expert but it seems that there is no really accepted synchronization pattern in Keystone project. I see various ways. E.g. why tlb_flush also does fence.i, I have no idea, and it is of course undocumented. For the synchronization primitives a few inline comments explaining the design decision would really make a difference.

The reason why Linux kernel does what it does is according to riscv manual I put about ASID re-use. Does Keystone use a single ASID or multiple? This would dictate a pattern for how sync should be done afaik, if I get this correctly.

jarkkojs commented 1 year ago

I feel that many times researchers around this topic of confidential computing mix two things: safety and security. This leads to quite loosely engineered software components.

For instance, untrusted device driver should still be a safe device driver, i.e. it should not crash your computer, which could lead loosing your important data. Even if the data is protected unplugging can still lead to total loss, if the data is not persisted.

Outside confidential computing consider flight control systems. They are quite insecure as shown e.g. Hugo Teso in the past but still aeroplanes have great safety properties, or that is sort of thing when you engineer planes (e.g. redundancy properties of planes).

Here I feel that once CVA6 was integrated its QA was completely ignored after that, which leads to a bad software quality. Keystone is secure but at the same it unsafe software because of stability issues and undocumented engineering practicaes. Random things are done without explaining why in the Git log.

I hope there would be a small uplift how Git commits are formed to make the project more applausible.

andreaskuster commented 1 year ago

Hi @jarkkojs

I can certainly understand some of your frustration as this issue you are hunting seems challenging to catch, and documentation is seldom self-explanatory in real life.

Can you please elaborate on what you specifically mean by CVA6 QA being completely ignored? Is there anything specific that should be addressed on the CVA6 side?

To my understanding of this thread, the official documentation for CVA6/Keystone (https://docs.keystone-enclave.org/en/latest/Getting-Started/Running-Keystone-on-CVA6.html) still works as expected.

Should you still have any issues with CVA6 itself, I strongly encourage you to open an issue directly at the corresponding repo to reach the current maintainers: https://github.com/openhwgroup/cva6

And of course, you can further follow up here (by tagging me) or on our email conversation with specific questions, and I will do my best to help.

jarkkojs commented 1 year ago

O

Hi @jarkkojs

I can certainly understand your frustration as this issue you are hunting seems challenging to catch, and documentation is seldom good enough to be self-explanatory.

Can you please elaborate on what you specifically mean by CVA6 QA being completely ignored? Is there anything specific that should be addressed on the CVA6 side?

To my understanding of this thread, the official documentation for CVA6/Keystone (https://docs.keystone-enclave.org/en/latest/Getting-Started/Running-Keystone-on-CVA6.html) still works as expected.

Should you still have any issues with CVA6 itself, I strongly encourage you to open an issue directly at the corresponding repo to reach the current maintainers: https://github.com/openhwgroup/cva6

And of course, you can further follow up here (by tagging me) or on our email conversation with specific questions, and I will do my best to help.

It is overall project quality of e.g. driver that I'm a bit concerned of. E.g. the original research paper on Keystone literally advertises bad use of kernel buddy allocator and CMA allocator, as it was some sort of solution per se. And if you read the Git log most of the commits lack almost any description what they do and what is the reasoning. Like changing synchronisation primitives with a description "debugging" is weird. Also I doubt that there has been much CI on CVA6 since it was upstreamed, given that even the documentation points to your repository. Usually someone has pro-actively maintain an architecture in order to for it to keep up-to-date.

I think you are correct when it comes to the master branch but I should re-check that tho, and CVA6 issue is not thus in that way part that rant.

I only recently found out that the internal project is using dev-page-tables branch and not master. It is apparently a new way of doing paging (and sort of better) where enclave itself builds its own page tables. It is proactively developed by @evgenyp67, and initiated by @cathylu10 but since it is undocumented my understanding is based on reading the source code.

Something in the sync of csrw of the satp inside the loader is not liked by CVA6. I just found out that QEMU has had Spike support since 7.0. I tried with this command-line:

/home/jarkkojs/work/keystone/qemu/build/riscv64-softmmu/qemu-system-riscv64 \
 $DEBUG \
 -m 2G \
 -nographic \
 -machine spike,rom=/home/jarkkojs/work/keystone/build/bootrom.build/bootrom.bin \
 -bios /home/jarkkojs/work/keystone/build/sm.build/platform/generic/firmware/fw_payload.bin \
 -netdev user,id=net0,net=192.168.100.1/24,dhcpstart=192.168.100.128,hostfwd=tcp::${HOST_PORT}-:22 \
 -device virtio-net-device,netdev=net0 \
 -device virtio-rng-pci \
 -smp $SMP

I get:

$ build/scripts/run-qemu.sh
**** Running QEMU SSH on port 3531 ****
qemu-system-riscv64: Property 'spike-machine.rom' not found

Not sure where and why it tries look up that file but I guess I will found that out. So as the next trial I'm giving my hopes that the issue of dev-page-table will trigger also with Spike which would make debugging whole a lot easier.

IMHO, Keystone should optimize to be "safe and secure" and not just "secure", which can lead to "unsafe and secure".

jarkkojs commented 1 year ago

@andreaskuster I believe you are absolutely right on what you said on master branch. I only recently found out that I'm actually working on https://github.com/keystone-enclave/keystone/tree/dev-evgeny-dll-clean :-) Good point and correction.

A small improvement to the project quality that would not require scale to the core team (whatever that is for this project): decent commit messages to the Git log. Most do not even have xfer to Github issue and even if there was, it would be pain to xfer constantly. "debugging" or "small fix" are not acceptable commit message tbh. They do not tell anything. An empty commit message would be better than "small fix". By writing a small prose what commit does is already a nimi-review done by the author for the commit. It would significantly give a bump to the overall code quality over long period of time.

jarkkojs commented 1 year ago

I've talked a lot of confidential computing researchers over the years since I started working on SGX in early 2013, and there is this common theme that "this and that does not matter because our thing is designed to run in an environment with malware".

I claim that it is not a true statement. Confidential computing gives a tool prevent unauthorized use in a such environment, assuming that CPU extension, SM or whatever TEE it is, is implemented correctly. Preventing unauthorized used, however, does not map to enabling authorized correct use of software. Confidential computing addresses only one non-functional aspect but does not make other aspect unworthy to optimize.

jarkkojs commented 1 year ago

In any case I will close this one and open a new one once I've (re-)verified master. I open new one with the focus on dev-evgeny-dll-clean.

I just documented here some ideas, opinions and suggestions so that there is more relevant CVA6 related information when you google for "CVA6 Keystone" than just https://github.com/keystone-enclave/keystone/issues/259 (my in-kernel driver addresses this issue with a manner that works for the mainline kernel, I wrote it actually because I thought first that I also had something wrong with the mm side of things).

jarkkojs commented 1 year ago

To be totally honest Keystone was better off, if the whole project was forked.

I rarely think this way but in this case it might be a solution because a source code dump with open license is not the same thing as free software project. There's no much community response in this project and decisions are not argued and made within the community. There is no leadership, no roadmap and not transparent anything of the direction where things are ongoing.

I'm not going to fork this but I believe that is inevitable thing to happen by "someone".

On the contrary, I'll try find the glitches so that this would not happen... Just by experience would not surprise me.

I see a Code of Conduct document but why it is there even? If there's no community, there's no need for such document in the first place.

keystone-enclave / keystone

SM trap handler gives "illegal instruction" on CVA6 #374