linaro-swg / linux

Linux kernel source tree
Other
41 stars 79 forks source link

[Xen] optee_driver_init() panic after device PTA invoke error #77

Open lorc opened 4 years ago

lorc commented 4 years ago

There are multiple issues with device PTA driver:

  1. It uses old style TMEM arg even with dynamic SHM enabled. Xen is not happy about this
  2. If optee_enumerate_devices() fails, optee_driver_init() fails. But actually optee can work without this feature. Does it means, than older OP-TEE versions are not supported anymore?
  3. Drivers panics during optee_remove()

Basically I'm getting this error log:

optee: probing for conduit method from DT.                                                         
optee: revision 3.8                                                                                
optee: dynamic shared memory is enabled    
(XEN) optee.c:868:d0v0 Guest tries to use old tmem arg
optee: PTA_CMD_GET_DEVICES invoke function err: ffff0006                                           
Unable to handle kernel NULL pointer dereference at virtual address 00000058
Mem abort info:                                 
  Exception class = DABT (current EL), IL = 32 bits  
  SET = 0, FnV = 0                         
  EA = 0, S1PTW = 0                         
Data abort info:                                                                                   
  ISV = 0, ISS = 0x00000005         
  CM = 0, WnR = 0                                                                                  
[0000000000000058] user address but active_mm is swapper                                           
Internal error: Oops: 96000005 [#1] PREEMPT SMP
Modules linked in:     
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.75-ltsi-yocto-tiny #7                               
Hardware name: Renesas Salvator-XS board based on r8a7795 ES2.0+ (DT)                              
task: ffffffc6dd908000 task.stack: ffffffc6dd904000                             
PC is at tee_shm_free+0x18/0x40                  
LR is at optee_disable_shm_cache+0x88/0xbc       
pc : [<ffffff80083f45a4>] lr : [<ffffff80083f6540>] pstate: 80000045
sp : ffffffc6dd907c80
x29: ffffffc6dd907c80 x28: ffffff8008729300 
x27: 0000000000000007 x26: ffffff80086f8078 
x25: ffffffc6d76d4980 x24: 0000000000000004 
x23: ffffff8008559798 x22: 00000000b200000a 
x21: ffffffc6dd907ce0 x20: ffffffc6dd0c2c00 
x19: 0000000000000000 x18: 0000000000000010 
x17: 0000000000007fff x16: 00000000deadbeef 
x15: 0000000000000000 x14: ffffffc6dce4f028 
x13: ffffffc6d76cfba0 x12: 0000000000000020 
x11: ffffffc6dce4eff8 x10: ffffffc6d76cfba0 
x9 : 0000000000000001 x8 : ffffff80083f5598 
x7 : 0000000000000000 x6 : 0000000000000000 
x5 : 0000000000000000 x4 : 0000000000000000 
x3 : 0000000000000000 x2 : 0000000000000000 
x1 : 0000000000000000 x0 : ffffff80083f6540 
Process swapper/0 (pid: 1, stack limit = 0xffffffc6dd904000)
Call trace:
Exception stack(0xffffffc6dd907b40 to 0xffffffc6dd907c80)
7b40: ffffff80083f6540 0000000000000000 0000000000000000 0000000000000000
7b60: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
7b80: ffffff80083f5598 0000000000000001 ffffffc6d76cfba0 ffffffc6dce4eff8
7ba0: 0000000000000020 ffffffc6d76cfba0 ffffffc6dce4f028 0000000000000000
7bc0: 00000000deadbeef 0000000000007fff 0000000000000010 0000000000000000
7be0: ffffffc6dd0c2c00 ffffffc6dd907ce0 00000000b200000a ffffff8008559798
7c00: 0000000000000004 ffffffc6d76d4980 ffffff80086f8078 0000000000000007
7c20: ffffff8008729300 ffffffc6dd907c80 ffffff80083f6540 ffffffc6dd907c80
7c40: ffffff80083f45a4 0000000080000045 ffffffc6dd907cb0 ffffff80083f6520
7c60: ffffffffffffffff ffffffc6dd0c2c00 ffffffc6dd907c80 ffffff80083f45a4
[<ffffff80083f45a4>] tee_shm_free+0x18/0x40
[<ffffff80083f6540>] optee_disable_shm_cache+0x88/0xbc
[<ffffff80083f5a90>] optee_remove+0x20/0x68
[<ffffff80086e08f4>] optee_driver_init+0x5ac/0x5ec
[<ffffff8008083078>] do_one_initcall+0x104/0x114
[<ffffff80086b0f0c>] kernel_init_freeable+0x228/0x22c
[<ffffff80084fbe68>] kernel_init+0x18/0x108
[<ffffff8008084218>] ret_from_fork+0x10/0x18
Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (b9405a60) 
---[ end trace b20a0eef62340328 ]---

Note that Xen mediator returns an error and then kernel panics.

lorc commented 4 years ago

This small changes fixes the second issue. But I don't know if it the right way:

diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
index 34409c916882..91584b066787 100644
--- a/drivers/tee/optee/core.c
+++ b/drivers/tee/optee/core.c
@@ -728,10 +728,8 @@ static int __init optee_driver_init(void)
                return PTR_ERR(optee);

        rc = optee_enumerate_devices();
-       if (rc) {
-               optee_remove(optee);
-               return rc;
-       }
+         if (rc)
+                pr_warn("can't enumerate optee_devices: %d\n", rc);

        pr_info("initialized driver\n");
etienne-lms commented 4 years ago

It uses old style TMEM arg even with dynamic SHM enabled. Xen is not happy about this

Right, it would make sense that driver uses RMEM when dynamic shm capability is supported.

This small changes fixes the second issue. But I don't know if it the right way:

I think your change is legitimate. OP-TEE does not mandate TA/PTA devices enumeration.

Drivers panics during optee_remove()

Indeed, that's bad.

jforissier commented 4 years ago

2. If optee_enumerate_devices() fails, optee_driver_init() fails. But actually optee can work without this feature. Does it means, than older OP-TEE versions are not supported anymore?

optee_enumerate_devices() should not fail when "TEE device" enumeration is not supported by OP-TEE. It should simply enumerate nothing. I have checked that it is the case on QEMU if the PTA is not compiled in (CFG_DEVICE_ENUM_PTA=n).

So it seems to me we have only (!) two issues here, not three.

etienne-lms commented 4 years ago

(edited: rephrased)

@jforissier, you're right. Actually it is explicitly stated in the Linux driver. It should also does not fail if device PTA is found and it returns no device to enumerate: shm_size == 0.

Regarding device enumeration and TMEM: Looking into device.c, i can't see where TMEM is explicitly used. Looks like the driver allocates SHM.

lorc commented 4 years ago

optee_enumerate_devices() should not fail when "TEE device" enumeration is not supported by OP-TEE.

Yes, you are right. In my case tee_client_open_session completes successfully, but then get_devices fails. On one hand, this means that something is not right, but on other hand - all other features still work fine afterwards. So, I don't know: should we consider faults in optee_enumerate_devices as critical?

Looking into device.c, i can't see where TMEM is explicitly used. Looks like the driver allocates SHM.

I also checked the code. It seems legit, nevertheless Xen is not happy. I'll take a closer look.

And by the way. Is this right? I can see only one parameter used, not four.

jenswi-linaro commented 4 years ago

No, optee_enumerate_devices() should not be critical.

I didn't know that Xen didn't support TMEM, how does it support the argument structure? Doesn't that always go via TMEM?

It's still OK to send four parameters even if only one is used.

jenswi-linaro commented 4 years ago

@lorc, your patch makes sense. If you post it on LKML I'll try to get it into 5.7 if it works well.

lorc commented 4 years ago

I didn't know that Xen didn't support TMEM, how does it support the argument structure? Doesn't that always go via TMEM?

The problem is not in TMEM per se, but in its location. Xen does not support static SHM. And before 3.9, there was no cases when TMEM was used for user SHM. I believe, when both dynamic SHM and registered SHM features are enabled, optee client uses TMEM only for "register shm" calls, which are handled separately in Xen. "Open session" and "call TA" calls always use RMEM.

I believe Xen is being too restrictive there. I'll see what is needed to allow TMEM buffers when they are residing in guest memory.

So, Jerome is right, there is only two issues in the linux driver. Maybe, even one and half :)

I'll post patch in LKML in the nearest time.

jforissier commented 4 years ago

@lorc I am changing the title of this issue because it appears to be misleading now.