Open rtrox opened 5 years ago
Sorry, I should have also mentioned: This is updating to the latest kernel from the latest sdcard image (1.9.0).
Hardware is a mix of Rpi 2B+, 3, and 3B+, though it doesn't look to be hardware related.
Thanks @rtrox I can reproduce the problem. containerd cannot be started as modprobe overlay
does not work.
Indeed there are missing several files
$ ls /lib/modules/4.14.79-hypriotos-v7+/
kernel modules.builtin modules.order
and the previous one
$ ls -l /lib/modules/4.14.34-hypriotos-v7+/
total 1928
drwxr-xr-x 11 root root 4096 Dec 1 18:43 kernel
-rw-r--r-- 1 root root 500660 Apr 22 2018 modules.alias
-rw-r--r-- 1 root root 518161 Apr 22 2018 modules.alias.bin
-rw-r--r-- 1 root root 11137 Apr 22 2018 modules.builtin
-rw-r--r-- 1 root root 12165 Apr 22 2018 modules.builtin.bin
-rw-r--r-- 1 root root 149781 Apr 22 2018 modules.dep
-rw-r--r-- 1 root root 215917 Apr 22 2018 modules.dep.bin
-rw-r--r-- 1 root root 302 Apr 22 2018 modules.devname
-rw-r--r-- 1 root root 57528 Apr 22 2018 modules.order
-rw-r--r-- 1 root root 403 Apr 22 2018 modules.softdep
-rw-r--r-- 1 root root 216125 Apr 22 2018 modules.symbols
-rw-r--r-- 1 root root 265295 Apr 22 2018 modules.symbols.bin
@StefanScherer - for what it's worth, I'd installed 4.14.70 from the artifacts in CircleCI before I saw that you'd uploaded this version, and it appeared to have similar characteristics.
Thanks @rtrox yes, I've checked the contents of the deb files by downloading to my Mac and then listing them in a Linux container:
docker run -it -v $(pwd):/deb ubuntu dpkg -c /deb/raspberrypi-kernel_20180922-053217_armhf.deb | grep -v kernel
The 4.14.34 was fine. The latest kernel 4.14.79 from the Raspberry Org also looks better. I've installed it with
sudo apt install raspberrypi-kernel=1.20181112-1
sudo reboot
and I'm trying latest Docker with it. Single node docker swarm also looks good.
I'll remove the 4.14.79 kernel from packagecloud again as it has missing files. In the meantime I have prepared a SD card with upstream kernel 4.14.79 and latest Docker Engine and tools installed. Stay tuned ;-)
Here is a first SD card image https://github.com/hypriot/image-builder-rpi/releases/download/v1.10.0-rc1/hypriotos-rpi-v1.10.0-rc1.img.zip I would love to hear your feedback.
@StefanScherer this morning I drained one of my kubernetes node, and loaded the new sd card image onto it. I was able to boot it no problem, but now that I'm trying to join it back into the cluster, it's started crash looping, I suspect because of weave (CNI plugin).
I'm trying now to figure out the cause, I saw some issues accessing /var/lib/docker/overlay2 in the logs, but I'm having trouble getting in quickly enough to stop kubelet prior to the system crashing. I may have to pull out a console cable to get more info.
edit: You may want to skip to the end if you're just seeing this. part of my setup playbooks was to install the latest raspberrypi-kernel (this was to pull in the kernel you had uploaded yesterday). Well it turns out, for some reason apt is considering the 4.14.34-hypriot kernel to be newer than the upstream 4.14.79 kernel.
Ok, I was able to get the crash out of dmesg through serial console:
[ 820.227701] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 820.237101] pgd = ad61c000
[ 820.241181] [00000000] *pgd=25434835, *pte=00000000, *ppte=00000000
I'm not sure that I have the skills to track that much further, but will see if I can get more info out of journald about the context.
Ok. I was able to get the actual kernel panic with the stack trace out as well:
https://gist.github.com/rtrox/b9f6a682c8717f79da8ae13f673ed5c0
trying to debug kernel stack traces puts me a little out of my element :-/ but to me it looks like the vxlan module in this kernel is hitting a null pointer error, the context appears to be creating an ipv6 bridge, if I'm reading this correctly.
I don't think I can pull out much more data here, going to try and redownload the image and reflash the card to rule out any image corruption.
wait... I might be being really dumb. that stack trace references 4.14.34, I could have sworn I flashed the newest image... this newest flash definitely has the newer kernel (though that vxlan kernel panic did look familiar):
$ uname -a
Linux black-pearl 4.14.79-v7+ #1159 SMP Sun Nov 4 17:50:20 GMT 2018 armv7l GNU/Linux
edit: Nope, I was using the correct image, but apt-get installing raspberrypi-kernel seems to be downgrading it to the latest hypriot kernel, which is 4.14.34:
Linux node7.kube.intern.rtrox.com 4.14.34-hypriotos-v7+ #1 SMP Sun Apr 22 14:57:31 UTC 2018 armv7l GNU/Linux
I checked, and the previously present apt preferences file is no longer included, so I'm not sure why apt is considering the hypriot .34 kernel newer than the upstream .79 kernel. For now, I've removed the apt-get install raspberrypi-kernel from my setup playbook (though an apt-get upgrade is still present). We'll see if upgrade still downgrades this kernel.
It looks like even a general package update is pulling in the older hypriot kernel as if it were a newer package. It might be necessary to pin raspberrypi-kernel to the raspbian.raspberrypi.org repo in the image to prevent this, unless you know why apt is seeing 4.14.34-hypriot as newer than 4.14.79.
I have to run and do some errands, but I'll pin the kernel and test the image again tonight.
Nice catch. I‘ll try to fix that later before diving into DockerCon 🤪
Issue: Kernel 4.14.79 seems to be working great, but it looks like the dependency map isn't being installed in /lib/modules/ during installation. This causes modprobe to fail with:
Docker and Containerd require overlayfs, so the containerd service file fails trying to run
/sbin/modprobe overlay
Workaround:
sudo depmod
on each pi fixed this pretty quickly and easilyWorkaround is pretty simple, but I believe these files should be being created during the deb install (from searches it looks like linux-image-* should be installing these)