linaro-swg / linux

Linux kernel source tree
Other
41 stars 79 forks source link

optee: Data abort while trying to work with preallocated shared buffer on L4.14 #52

Closed lorc closed 6 years ago

lorc commented 6 years ago

Tested on qemu_v8

To reproduce

  1. Checkout and build latest kernel (I used commit 4a704d6db0ee4a349d5d523813a718e429b4914d)
  2. Run xtest

Kernel panics:

  • regression_1001 Core self tests [ 7.055672] Unable to handle kernel paging request at virtual address ffff80003fe00000 [ 7.055882] Mem abort info: [ 7.055981] Exception class = DABT (current EL), IL = 32 bits [ 7.056100] SET = 0, FnV = 0 [ 7.056175] EA = 0, S1PTW = 0 [ 7.056261] Data abort info: [ 7.056335] ISV = 0, ISS = 0x00000071 [ 7.056423] CM = 0, WnR = 1 [ 7.056550] swapper pgtable: 4k pages, 48-bit VAs, pgd = ffff000008f49000 [ 7.056704] [ffff80003fe00000] pgd=00000000820f8003, pud=00000000820f7003, pmd=000000007ff14003, pte=00e800007fe00712 [ 7.057069] Internal error: Oops: 96000047 [#1] PREEMPT SMP [ 7.057261] Modules linked in: [ 7.057505] CPU: 0 PID: 1201 Comm: xtest Not tainted 4.14.0-rc1-00070-g4a704d6 #109 [ 7.057664] Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 [ 7.057841] task: ffff800040569f80 task.stack: ffff0000096f8000 [ 7.058005] PC is at __memset+0x1ac/0x1d0 [ 7.058098] LR is at pool_op_gen_alloc+0x58/0x90 [ 7.058254] pc : [] lr : [] pstate: 40000145

The same test on linaro/optee (ac0c2c26b9819c5e95d56cb2d8937de0357eecaa) works perfectly fine.

Small test shows that memory is remapped just fine, but access to it generates panic:

diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
index 7952357..2b96f70 100644
--- a/drivers/tee/optee/core.c
+++ b/drivers/tee/optee/core.c
@@ -388,7 +388,12 @@ optee_config_shm_memremap(optee_invoke_fn *invoke_fn, void **memremaped_shm)
                return ERR_PTR(-EINVAL);
        }
        vaddr = (unsigned long)va;
-
+       pr_info("remapped at %p\n", va);
+       pr_info("Trying direct access\n");
+       *((int*)va) = 0xFF;
+       pr_info("Trying memset\n");
+       memset(va, 0, size);
+       pr_info("Done\n");
        priv_info.vaddr = vaddr;
        priv_info.paddr = paddr;
        priv_info.size = OPTEE_SHM_NUM_PRIV_PAGES * PAGE_SIZE;

[ 3.460483] optee: probing for conduit method from DT. [ 3.461409] optee: remapped at ffff80003fe00000 [ 3.461486] optee: Trying direct access [ 3.461725] Unable to handle kernel paging request at virtual address ffff80003fe00000 [ 3.461832] Mem abort info: [ 3.461900] Exception class = DABT (current EL), IL = 32 bits [ 3.462004] SET = 0, FnV = 0 [ 3.462104] EA = 0, S1PTW = 0 [ 3.462220] Data abort info: [ 3.462276] ISV = 0, ISS = 0x00000071 [ 3.462338] CM = 0, WnR = 1

I'm currently investigating this further.

lorc commented 6 years ago

I found why this happens, but I'm not sure if this behavior is intended. So, memremap() detects that shared region is located in System RAM, and, thus, returns pointer to mapped system memory. Problem is that this region is not mapped. It is being unmapped during kernel initialization:

[ 0.000000] [] change_page_range+0x70/0x80 [ 0.000000] [] apply_to_page_range+0x1ec/0x390 [ 0.000000] [] change_memory_common+0x38/0xb0 [ 0.000000] [] set_memory_valid+0x1c/0x40 [ 0.000000] [] kernel_map_pages+0x14/0x20 [ 0.000000] [] free_pages_ok+0x2a0/0x2a8 [ 0.000000] [] free_pages+0x30/0x50 [ 0.000000] [] __free_pages_bootmem+0xa0/0xa8 [ 0.000000] [] free_all_bootmem+0x124/0x194 [ 0.000000] [] mem_init+0x98/0x2b8 [ 0.000000] [] start_kernel+0x1ec/0x3b0

change_page_range() removes valid bit from a PTE.

Actually, I expected that kernel memory is always mapped. But looks like this is not true. So, from user point of view it looks like memremap() returns pointer to not mapped memory. I don't know is this right behavior. But for me it looks wrong.

lorc commented 6 years ago

Also, I'm bothered by /reserved-memory entry in DTB. According to optee-os code, it should write range of shared memory region there. But in reality, it stores TEE RAM region:

[ 0.000000] OF: reserved mem: name: optee@0xe400000 base:e400000, size:200000

(SHM starts at 0x7fe00000).

Looks like misconfiguration somewhere.

jenswi-linaro commented 6 years ago

It looks like config_nsmem() in OP-TEE is using virtual addresses instead of physical addresses.

lorc commented 6 years ago

@jenswi-linaro, great eye! You are right. I did quick and dirty fix. Now kernel panics during early boot.

Looking further....

lorc commented 6 years ago

That's funny. QEMU puts initrd image in the middle of SHM region. So, with right /reserved-memory/optee entry, kernel sees that initrd is in midst of reserved memory and panics.

I tried to make SHM region smaller and all worked fine.

jenswi-linaro commented 6 years ago

Aha, that's bad luck. I suppose we'd better change address range used for SHM inside OP-TEE. The good news is that we only need to update inside OP-TEE as the device tree passed to the kernel is updated automatically.

lorc commented 6 years ago

I created two PRs, that together should fix this issue:

Will close this issue after merge.