Closed jforissier closed 4 years ago
Sounds good to me Jerome! Feel free to step up to the 5.3 version.
HiKey960 doesn't boot with 5.3, I'm trying to find out why.
HiKey960 doesn't boot with 5.3, I'm trying to find out why.
I still haven't found the problem. The board does not seem to be hung (the heartbeat LED flashes) but the login prompt never appears. The init process is started and some subprocesses are created too but I see nothing after "Starting syslogd".
Same problem with HiKey (620). But QEMU and QEMUv8 work fine. Update: actually, QEMUv8 (64 bits) has the same problem with v5.3 and I did not see it, I must have run the wrong kernel :-/ QEMU 32 bits runs OK however.
Same problem with HiKey (620)
Without looking at anything, but we've had weird issues due to l-loader in the past. Since this affects both 620 and 960 it might be worth to see if there are new l-loader patches that we need.
it might be worth to see if there are new l-loader patches that we need.
Well nothing new, I have the latest.
I don't understand what's going on. The boot looks good as far as kernel init goes, then /init
is successfully started but things stop shortly after. Nothing on the console, no login prompt. If I replace /init
with a shell, I can type some commands but as soon as I run something such as ls
, the command runs OK and hangs on exit. I never get back the prompt again :open_mouth:
The board is not completely dead since it prints "random: crng init done" after a couple of minutes. I suspect there may be some kind of infinite loop because I also get a thermal alarm message.
[ 4.477185] Run /init as init process
+ exec /bin/busybox sh
sh: can't access tty; job control turned off
/ # [ 4.562740] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
[ 4.670210] mmc_host mmc1: Bus speed (slot 0) = 25000000Hz (slot req 25000000Hz, actual 25000000HZ div = 0)
[ 4.783293] mmc_host mmc1: Bus speed (slot 0) = 400000Hz (slot req 400000Hz, actual 400000HZ div = 0)
[ 4.905617] mmc_host mmc1: Bus speed (slot 0) = 25000000Hz (slot req 25000000Hz, actual 25000000HZ div = 0)
[ 5.156374] wlcore: wl18xx HW: 183x or 180x, PG 2.2 (ROM 0x11)
[ 5.168123] wlcore: WARNING Detected unconfigured mac address in nvs, derive from fuse instead.
[ 5.176825] wlcore: WARNING This default nvs file can be removed from the file system
[ 5.185316] wlcore: loaded
/ #
/ # [ 10.760304] hisi_thermal fff30000.tsensor: sensor <1> THERMAL ALARM stopped: 61065 < 65000
/ #
/ #
/ #
/ #
/ # ls
bin init linuxrc opt run tmp
dev lib media proc sbin usr
etc lib64 mnt root sys var
[ 36.522261] hisi_thermal fff30000.tsensor: sensor <1> THERMAL ALARM: 65370 > 65000
*** Here the shell does not respond anymore ***
THERMAL_ALARM
? Overheating? I think I would give git bisect
a try to see where the issue started to occur.
It happens quite often, on and off, looks like normal thermal throttling actually.
As for bisecting... I tried to, but I reached many commits that show different issue(s) (panicking when starting /init
) so I did not find anything interesting :(
I have made some progress but have not found the root cause yet :-/
I can reproduce the issue with QEMUv8 (64-bit kernel). When the boot hangs the CPU is indeed in an infinite loop but I could not determine where exactly: https://pastebin.com/qenRPXJ7
After some painful bisection I managed to find a working configuration, that's basically v5.3 with 3 commits reverted: https://github.com/jforissier/linux/commits/wip/optee-v5.3-fixes
commit 7b6c5500b291e6015357ce83f08602e945aaa064 (HEAD -> wip/optee-v5.3-fixes, jf/wip/optee-v5.3-fixes)
Author: Jerome Forissier <jerome@forissier.org>
Date: Fri Oct 11 14:47:50 2019 +0200
Revert "arm64: arch_timer: Ensure counter register reads occur with seqlock held"
This reverts commit 75a19a0202db21638a1c2b424afb867e1f9a2376.
commit 3bf6d8fbb32f03f35f8ef948ea0ab5abd36f87e6
Author: Jerome Forissier <jerome@forissier.org>
Date: Fri Oct 11 14:42:44 2019 +0200
Revert "arm64: vdso: Explicitly add build-id option"
This reverts commit 7a0a93c51799edc45ee57c6cc1679aa94f1e03d5.
commit 1ea7d54b284a1ad0079facbf2b2da1e2abcfefab
Author: Jerome Forissier <jerome@forissier.org>
Date: Fri Oct 11 14:42:38 2019 +0200
Revert "arm64: vdso: Substitute gettimeofday() with C implementation"
This reverts commit 28b1a824a4f44da46983cd2c3249f910bd4b797b.
I can't believe the 5.3 kernel could be broken like this (?) so I must be missing something...
After stumbling upon https://lkml.org/lkml/2019/9/6/824 I tried https://git.linaro.org/people/john.stultz/android-dev.git/log/?h=dev/dma-buf-heap on my HiKey960 with hikey960_defconfig
, in the same OP-TEE build environment: does not boot either.
Now, the next thing I want to check is upgrading TF-A. The fact that HiKey960 and QEMU are affected make me think there is something external to the kernel that is not compatible anymore with v5.3. Update: same with latest TF-A (master) on QEMU.
@johnstultz-work any idea?
Update: it appears that the source of all my problems was ccache
:sob:
More details here: http://lists.infradead.org/pipermail/linux-arm-kernel/2019-December/697840.html.
Therefore, there is no obstacle to upgrading our Linux branch, except that I noticed an issue with the dynamic discovery by the v5.3 or v5.4 kernel of devices implemented by pseudo-TAs in OP-TEE:
[ 4.180992] optee: PTA_CMD_GET_DEVICES invoke function err: ffff0006
As a result, the driver refuses to initialize and OP-TEE cannot be used. A fix is in linux -next so I think we'd better wait until it lands in a tagged release (v5.5-rc1 at least, or even v5.5).
A fix is in linux -next so I think we'd better wait until it lands in a tagged release (v5.5-rc1 at least, or even v5.5).
Yeah, no hurry I guess. At some point it'd be nice to start testing and using the upstream branch as default. I know we have all the DT-stuff for various devices/platforms etc on our fork and couple of other patches. But, it feels strange that we have a kernel driver upstream but we who writes it and maintains it "cannot use it" for X and Y reason. We should fix X and Y instead.
@jbech-linaro agreed. The Linaro branch only brings DT stuff as you said, which we can probably drop and let platforms deal with properly (QEMU, HiKey not concerned at least); and the other thing is SDP which depends on the ION unmapped heap feature which we know is unlikely to land upstream any time soon so we may as well leave it as-is and untested.
@jforissier Sorry I didn't see this until now! Glad you have it working again. Feel free to ping me via email/irc if I'm not responding (I'll check my notification settings here).
Superseded by https://github.com/linaro-swg/linux/issues/71.
Hi,
If I'm not mistaken, it's soon OP-TEE release time, so it may be a good time to consider rebasing our kernel branch (which is currently based on Linux 5.0).
Therefore, I have prepared wip/optee-5.3 which is linaro-swg/linux.git branch optee rebased onto upstream kernel 5.3 (and with the
upstream-tee-subsys-patches.txt
file updated).Sanity-tested on QEMU, I'll check HiKey and HiKey960 tomorrow. Assuming all is well, @jbech-linaro @jenswi-linaro let me know if you're OK with replacing the tip of the optee branch with this new one. Thanks!