hypriot / rpi-kernel

Build a Linux kernel for Raspberry Pi 0/1/2/3/3B+
MIT License
134 stars 46 forks source link

Kernel 4.14.79 does not have dependency map #50

Open rtrox opened 5 years ago

rtrox commented 5 years ago

Issue: Kernel 4.14.79 seems to be working great, but it looks like the dependency map isn't being installed in /lib/modules/ during installation. This causes modprobe to fail with:

kmod_search_moddep() could not open moddep file '/lib/modules/4.14.79-hypriotos-v7+/modules.dep.bin'

Docker and Containerd require overlayfs, so the containerd service file fails trying to run /sbin/modprobe overlay

Workaround: sudo depmod on each pi fixed this pretty quickly and easily

Workaround is pretty simple, but I believe these files should be being created during the deb install (from searches it looks like linux-image-* should be installing these)

rtrox commented 5 years ago

Sorry, I should have also mentioned: This is updating to the latest kernel from the latest sdcard image (1.9.0).

Hardware is a mix of Rpi 2B+, 3, and 3B+, though it doesn't look to be hardware related.

StefanScherer commented 5 years ago

Thanks @rtrox I can reproduce the problem. containerd cannot be started as modprobe overlay does not work.

StefanScherer commented 5 years ago

Indeed there are missing several files

$ ls /lib/modules/4.14.79-hypriotos-v7+/
kernel  modules.builtin  modules.order

and the previous one

$ ls -l /lib/modules/4.14.34-hypriotos-v7+/
total 1928
drwxr-xr-x 11 root root   4096 Dec  1 18:43 kernel
-rw-r--r--  1 root root 500660 Apr 22  2018 modules.alias
-rw-r--r--  1 root root 518161 Apr 22  2018 modules.alias.bin
-rw-r--r--  1 root root  11137 Apr 22  2018 modules.builtin
-rw-r--r--  1 root root  12165 Apr 22  2018 modules.builtin.bin
-rw-r--r--  1 root root 149781 Apr 22  2018 modules.dep
-rw-r--r--  1 root root 215917 Apr 22  2018 modules.dep.bin
-rw-r--r--  1 root root    302 Apr 22  2018 modules.devname
-rw-r--r--  1 root root  57528 Apr 22  2018 modules.order
-rw-r--r--  1 root root    403 Apr 22  2018 modules.softdep
-rw-r--r--  1 root root 216125 Apr 22  2018 modules.symbols
-rw-r--r--  1 root root 265295 Apr 22  2018 modules.symbols.bin
rtrox commented 5 years ago

@StefanScherer - for what it's worth, I'd installed 4.14.70 from the artifacts in CircleCI before I saw that you'd uploaded this version, and it appeared to have similar characteristics.

StefanScherer commented 5 years ago

Thanks @rtrox yes, I've checked the contents of the deb files by downloading to my Mac and then listing them in a Linux container:

docker run -it -v $(pwd):/deb ubuntu dpkg -c /deb/raspberrypi-kernel_20180922-053217_armhf.deb | grep -v kernel

The 4.14.34 was fine. The latest kernel 4.14.79 from the Raspberry Org also looks better. I've installed it with

sudo apt install raspberrypi-kernel=1.20181112-1
sudo reboot

and I'm trying latest Docker with it. Single node docker swarm also looks good.

StefanScherer commented 5 years ago

I'll remove the 4.14.79 kernel from packagecloud again as it has missing files. In the meantime I have prepared a SD card with upstream kernel 4.14.79 and latest Docker Engine and tools installed. Stay tuned ;-)

StefanScherer commented 5 years ago

Here is a first SD card image https://github.com/hypriot/image-builder-rpi/releases/download/v1.10.0-rc1/hypriotos-rpi-v1.10.0-rc1.img.zip I would love to hear your feedback.

rtrox commented 5 years ago

@StefanScherer this morning I drained one of my kubernetes node, and loaded the new sd card image onto it. I was able to boot it no problem, but now that I'm trying to join it back into the cluster, it's started crash looping, I suspect because of weave (CNI plugin).

I'm trying now to figure out the cause, I saw some issues accessing /var/lib/docker/overlay2 in the logs, but I'm having trouble getting in quickly enough to stop kubelet prior to the system crashing. I may have to pull out a console cable to get more info.

edit: You may want to skip to the end if you're just seeing this. part of my setup playbooks was to install the latest raspberrypi-kernel (this was to pull in the kernel you had uploaded yesterday). Well it turns out, for some reason apt is considering the 4.14.34-hypriot kernel to be newer than the upstream 4.14.79 kernel.

rtrox commented 5 years ago

Ok, I was able to get the crash out of dmesg through serial console:

[  820.227701] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  820.237101] pgd = ad61c000
[  820.241181] [00000000] *pgd=25434835, *pte=00000000, *ppte=00000000

I'm not sure that I have the skills to track that much further, but will see if I can get more info out of journald about the context.

rtrox commented 5 years ago

Ok. I was able to get the actual kernel panic with the stack trace out as well:

https://gist.github.com/rtrox/b9f6a682c8717f79da8ae13f673ed5c0

trying to debug kernel stack traces puts me a little out of my element :-/ but to me it looks like the vxlan module in this kernel is hitting a null pointer error, the context appears to be creating an ipv6 bridge, if I'm reading this correctly.

I don't think I can pull out much more data here, going to try and redownload the image and reflash the card to rule out any image corruption.

rtrox commented 5 years ago

wait... I might be being really dumb. that stack trace references 4.14.34, I could have sworn I flashed the newest image... this newest flash definitely has the newer kernel (though that vxlan kernel panic did look familiar):

$ uname -a
Linux black-pearl 4.14.79-v7+ #1159 SMP Sun Nov 4 17:50:20 GMT 2018 armv7l GNU/Linux

edit: Nope, I was using the correct image, but apt-get installing raspberrypi-kernel seems to be downgrading it to the latest hypriot kernel, which is 4.14.34:

Linux node7.kube.intern.rtrox.com 4.14.34-hypriotos-v7+ #1 SMP Sun Apr 22 14:57:31 UTC 2018 armv7l GNU/Linux

I checked, and the previously present apt preferences file is no longer included, so I'm not sure why apt is considering the hypriot .34 kernel newer than the upstream .79 kernel. For now, I've removed the apt-get install raspberrypi-kernel from my setup playbook (though an apt-get upgrade is still present). We'll see if upgrade still downgrades this kernel.

rtrox commented 5 years ago

It looks like even a general package update is pulling in the older hypriot kernel as if it were a newer package. It might be necessary to pin raspberrypi-kernel to the raspbian.raspberrypi.org repo in the image to prevent this, unless you know why apt is seeing 4.14.34-hypriot as newer than 4.14.79.

I have to run and do some errands, but I'll pin the kernel and test the image again tonight.

StefanScherer commented 5 years ago

Nice catch. I‘ll try to fix that later before diving into DockerCon 🤪