Open Baughn opened 6 years ago
What do you mean by cluster create fails after kernel update and reboot. Are there any other approachs to handle this accurate?
I need a newer kernel than the image comes with in order to mount Ceph filesystems, which means the machine needs to be rebooted... but putting a reboot command in cloud-init causes cluster creation to fail, reasonably enough.
The workaround is to put just the install command there, then manually reboot the cluster afterwards. That's inconvenient, though.
so, I think if this is the only way, we should consider performing the kernel upgrade when rook is installed the first time and when the matched kernel version is lower than 4.7. This would mean, install kernel, reboot nodes and wait for them to be able to connect and then run the current install steps.
I think someone should do this after #47 is resolved. I've already started the work on this
In principle the kernel upgrade shouldn't be needed. I'm not sure why it is.
Also, we'll need an upgrade command (for k8s and underlying OS). Once we have this command, this issue would be solved somehow. The only remaining question is should we perform the upgrade before the install.
As a side note, on Ubuntu to leverage the livepatch feature to avoid rebooting after kernel update, you need an Enterprise subscription...
Update here, I have an idea of what is wrong with kernel 4.4. There are several issues with MDS, RBD etc., like incomplete folders (especially in CephFS). However
apt install linux-image-4.10.0-28-generic linux-headers-4.10.0-28-generi && reboot
fixes the problem. The question is, if and how to implement this in the tool. Or maybe just document this and close that issue
It might be easier to rebase everything onto Ubuntu 18.04 (which should be done at some point anyways) instead of trying to get such an old Kernel working…
Actually I don't know how k8s itself is behaving on u18. I remember when u16 was released it had a lot of issues at the beginning. I should try that out when the e2e test suite is there
Elias Probst notifications@github.com schrieb am Fr., 11. Mai 2018, 18:08:
It might be easier to rebase everything onto Ubuntu 18.04 instead of trying to get such an old Kernel working…
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/xetys/hetzner-kube/issues/63#issuecomment-388409338, or mute the thread https://github.com/notifications/unsubscribe-auth/ACoVc8jddGc-fvMntYKEayIfRv5Kl_ZEks5txbdggaJpZM4SjhM3 .
See https://github.com/rook/rook/issues/1044
This should have been fixed, so perhaps we're installing a too-old version of Ceph/Rook, but in any case I was unable to mount a filesystem using rook-toolbox without first upgrading the kernel.
Upgrading to linux-image-virtual-hwe-16.04 / linux-headers-virtual-hwe-16.04 fixes it, but putting that (and the necessary reboot) in cloud-init makes cluster create fail. It would be good if that could be handled better.