linaro-swg / linux

Linux kernel source tree
Other
41 stars 79 forks source link

Rebasing optee branch onto kernel 5.3 #69

Closed jforissier closed 4 years ago

jforissier commented 4 years ago

Hi,

If I'm not mistaken, it's soon OP-TEE release time, so it may be a good time to consider rebasing our kernel branch (which is currently based on Linux 5.0).

Therefore, I have prepared wip/optee-5.3 which is linaro-swg/linux.git branch optee rebased onto upstream kernel 5.3 (and with the upstream-tee-subsys-patches.txt file updated).

Sanity-tested on QEMU, I'll check HiKey and HiKey960 tomorrow. Assuming all is well, @jbech-linaro @jenswi-linaro let me know if you're OK with replacing the tip of the optee branch with this new one. Thanks!

jbech-linaro commented 4 years ago

Sounds good to me Jerome! Feel free to step up to the 5.3 version.

jforissier commented 4 years ago

HiKey960 doesn't boot with 5.3, I'm trying to find out why.

jforissier commented 4 years ago

HiKey960 doesn't boot with 5.3, I'm trying to find out why.

I still haven't found the problem. The board does not seem to be hung (the heartbeat LED flashes) but the login prompt never appears. The init process is started and some subprocesses are created too but I see nothing after "Starting syslogd".

Same problem with HiKey (620). But QEMU and QEMUv8 work fine. Update: actually, QEMUv8 (64 bits) has the same problem with v5.3 and I did not see it, I must have run the wrong kernel :-/ QEMU 32 bits runs OK however.

jbech-linaro commented 4 years ago

Same problem with HiKey (620)

Without looking at anything, but we've had weird issues due to l-loader in the past. Since this affects both 620 and 960 it might be worth to see if there are new l-loader patches that we need.

jforissier commented 4 years ago

it might be worth to see if there are new l-loader patches that we need.

Well nothing new, I have the latest.

I don't understand what's going on. The boot looks good as far as kernel init goes, then /init is successfully started but things stop shortly after. Nothing on the console, no login prompt. If I replace /init with a shell, I can type some commands but as soon as I run something such as ls, the command runs OK and hangs on exit. I never get back the prompt again :open_mouth:

The board is not completely dead since it prints "random: crng init done" after a couple of minutes. I suspect there may be some kind of infinite loop because I also get a thermal alarm message.

[    4.477185] Run /init as init process
+ exec /bin/busybox sh
sh: can't access tty; job control turned off
/ # [    4.562740] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
[    4.670210] mmc_host mmc1: Bus speed (slot 0) = 25000000Hz (slot req 25000000Hz, actual 25000000HZ div = 0)
[    4.783293] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
[    4.905617] mmc_host mmc1: Bus speed (slot 0) = 25000000Hz (slot req 25000000Hz, actual 25000000HZ div = 0)
[    5.156374] wlcore: wl18xx HW: 183x or 180x, PG 2.2 (ROM 0x11)
[    5.168123] wlcore: WARNING Detected unconfigured mac address in nvs, derive from fuse instead.
[    5.176825] wlcore: WARNING This default nvs file can be removed from the file system
[    5.185316] wlcore: loaded

/ #
/ # [   10.760304] hisi_thermal fff30000.tsensor: sensor <1> THERMAL ALARM stopped: 61065 < 65000
/ #
/ #
/ #
/ #
/ # ls
bin      init     linuxrc  opt      run      tmp
dev      lib      media    proc     sbin     usr
etc      lib64    mnt      root     sys      var
[   36.522261] hisi_thermal fff30000.tsensor: sensor <1> THERMAL ALARM: 65370 > 65000
*** Here the shell does not respond anymore ***
jbech-linaro commented 4 years ago

THERMAL_ALARM? Overheating? I think I would give git bisect a try to see where the issue started to occur.

jforissier commented 4 years ago

It happens quite often, on and off, looks like normal thermal throttling actually.

As for bisecting... I tried to, but I reached many commits that show different issue(s) (panicking when starting /init) so I did not find anything interesting :(

jforissier commented 4 years ago

I have made some progress but have not found the root cause yet :-/

I can't believe the 5.3 kernel could be broken like this (?) so I must be missing something...

After stumbling upon https://lkml.org/lkml/2019/9/6/824 I tried https://git.linaro.org/people/john.stultz/android-dev.git/log/?h=dev/dma-buf-heap on my HiKey960 with hikey960_defconfig, in the same OP-TEE build environment: does not boot either.

Now, the next thing I want to check is upgrading TF-A. The fact that HiKey960 and QEMU are affected make me think there is something external to the kernel that is not compatible anymore with v5.3. Update: same with latest TF-A (master) on QEMU.

@johnstultz-work any idea?

jforissier commented 4 years ago

Update: it appears that the source of all my problems was ccache :sob: More details here: http://lists.infradead.org/pipermail/linux-arm-kernel/2019-December/697840.html.

Therefore, there is no obstacle to upgrading our Linux branch, except that I noticed an issue with the dynamic discovery by the v5.3 or v5.4 kernel of devices implemented by pseudo-TAs in OP-TEE:

[    4.180992] optee: PTA_CMD_GET_DEVICES invoke function err: ffff0006

As a result, the driver refuses to initialize and OP-TEE cannot be used. A fix is in linux -next so I think we'd better wait until it lands in a tagged release (v5.5-rc1 at least, or even v5.5).

jbech-linaro commented 4 years ago

A fix is in linux -next so I think we'd better wait until it lands in a tagged release (v5.5-rc1 at least, or even v5.5).

Yeah, no hurry I guess. At some point it'd be nice to start testing and using the upstream branch as default. I know we have all the DT-stuff for various devices/platforms etc on our fork and couple of other patches. But, it feels strange that we have a kernel driver upstream but we who writes it and maintains it "cannot use it" for X and Y reason. We should fix X and Y instead.

jforissier commented 4 years ago

@jbech-linaro agreed. The Linaro branch only brings DT stuff as you said, which we can probably drop and let platforms deal with properly (QEMU, HiKey not concerned at least); and the other thing is SDP which depends on the ION unmapped heap feature which we know is unlikely to land upstream any time soon so we may as well leave it as-is and untested.

johnstultz-work commented 4 years ago

@jforissier Sorry I didn't see this until now! Glad you have it working again. Feel free to ping me via email/irc if I'm not responding (I'll check my notification settings here).

jforissier commented 4 years ago

Superseded by https://github.com/linaro-swg/linux/issues/71.