Open fmunteanu opened 1 year ago
This guide may be helpful, but don't expect much help on running an unsupported third-party app:
https://charmingwebdesign.com/raspberry-pi-kubernetes-cluster-with-cilium-cni/
Thank you for the link. Is there a way to offer a kernel with CONFIG_ARM64_VA_BITS_48
setting enabled?
Cilium is a new very promising technology, but their documentation is not helpful at the moment, see PR.
Can some of the devs explain why a third party app would require the following kernel settings? @joestringer is interested to understand what should be changed into Cilium codebase to allow CONFIG_ARM64_VA_BITS_39
default setting.
CONFIG_ARM64_VA_BITS_39=n
CONFIG_ARM64_VA_BITS_48=y
CONFIG_ARM64_VA_BITS=48
CONFIG_PGTABLE_LEVELS=4
Can some of the devs explain why a third party app would require the following kernel settings?
Cilium devs, perhaps - they wouldn't need to guess.
Thank you @pelwell, the Cilium devs are not sure what changes would be required into their software. I was hoping the RaspiOS devs could provide additional insight to allow Cilium work properly with default kernel settings, without requiring a recompile. Recompiling the kernel is not a viable approach, Cilium should work with current kernel.
Cilium expects larger pages of memory, the size of 48 bits required by Cilium is set to 39 bits for Raspberry Pi kernel.
Cilium expects larger pages of memory, the size of 48 bits required by Cilium is set to 39 bits for Raspberry Pi kernel.
Seems to be an indirect requirement via envoy via tcmalloc and not cilium itself?
I noticed envoy has os-specific overrides in its bazel configs (windows for example) which exclude tcmalloc, perhaps a workaround could be if envoy devs add a "rpi os" override (which covers a broad range of raspian/rpi-os derived OSes.) and then cilium devs could add a cli switch for cilium install
such as --quirks-rpi=true
which would allow the user to opt-in to that platform-specific build?
This would be an alternative to having raspberry pi OS change to larger page sizes (the side-effect of increasing VA bits), or having cilium ditch envoy, or having envoy ditch tcmalloc.
Just a thought. I am not a maintainer of any of these projects, just a disappointed netizen that a cluster rebuild has gone south. My solution is to ditch Cilium, unfortunately, because of a lack of active support on this issue. This has also colored my opinion for technology adoption in the workplace which is why I was doing a cluster rebuild to begin with. C'est la vie.
@wilson0x4d indeed, this is not a big work effort for Cilium devs and the fact this issue has been neglected all this time without even a basic response shows the devs are not interested to focus on RPi.
@wilson0x4d Thanks for digging further into this, I appreciate the level of detail. The potential link with tcmalloc and compile options of Envoy is a helpful step forward. While it would be ideal not to have platform-specific builds just for raspberry pi, it sounds like that may be a feasible solution. I can appreciate that it may not make sense for raspberry pi OS to increase VA bits or for Envoy to change its memory allocator in order to solve this issue.
@fmunteanu I respectfully disagree that there's been no response. The corresponding issue provides a workaround as well as a statement that the Cilium project is open to proposals to resolve this issue. As an open source project, the model for development involves interested parties making concrete proposals and sending them out for review. The project typically reviews those proposals to drive the technical direction of the solutions in a way that is maintainable to the project. If it's not a big work effort, then that is all the more reason for you to engage with the problem space and figure out how it can work, then present that solution to the community for discussion/inclusion. If you're unwilling or unable to propose such solutions, and you desire that functionality, then the onus is on you to find other like-minded individuals to pursue the solution. The Cilium project provides spaces like GitHub issue discussions and Slack channels (eg #arm64) to find others who are interested, but we cannot guarantee that someone will volunteer to work on your problem for you.
@joestringer the proposed fix is to recompile the kernel, which to me is unreasonable. Is there a way to have the size of 48 bits required by Cilium adjusted to 39 bits, when using RPi? I cannot propose a fix, because I don't have the required knowledge.
@fmunteanu I opened a similar issue with a request to adjust the same kernel flag in order to run envoy a while ago. So let me clarify a few things. Recompiling the kernel is only one possible fix. You could also switch over to Ubuntu server for RPi (which have already adjusted the mentioned kernel flags) or manually compile cilium-envoy similar to https://github.com/envoyproxy/envoy/issues/17854#issuecomment-956293059. This is necessary as envoy switched over to Google‘s tcmalloc which does not support kAddressBits detection. Using the previous gperftools enables envoy to run perfectly fine on any distro on RPis and is still supported. cilium-envoy is built using that project so the same compile flag should work.
Having a workaround in cilium-envoy only for RPi devices is not a good idea as this will interfere with users running other distros that have enabled CONFIG_ARM64_VA_BITS_48
. The best solution would be to find a way to fix the kAddressBits detection in the tcmalloc project as this will solve the problem for all users on all distros and devices.
Raspberry Pi OS and (cilium-)envoy have very good reasons for making their respective decisions. Since only a small number of people reacted to the linked issues I don’t think the number of affected users is large enough in order for any of the projects to take actions that might negatively impact the much larger rest of their user base. If you are willing to take the time and either come up with a solution yourself or find like-minded individuals that can do it for you I highly suggest to do so at tcmalloc‘s repository.
@wilson0x4d May I ask why you had to fully ditch Cilium? I am using it on Raspberry Pi OS myself just without cilium-envoy. Instead I am using a different HTTP ingress controller and that works very well.
@PKizzle will this PR address the issues? The main reason I prefer Cilium is because the elimination of iptables usage, what ingress controller do you use as alternative to cilium-envoy? I’m using k3s with Cilium, which has Traefik installed by default. I’m wondering if I could use the Traefik ingress controller.
@fmunteanu The mentioned PR does not solve the issue. It only works if /proc/config.gz
is present, which requires to load the configs
kernel module. Furthermore, it is "only" a test that does not dynamically adjust kAddressBits
during runtime.
I have used Contour in the past which also uses Envoy under the hood and compiled each new Envoy version myself. You could also have a look at HAProxy Kubernetes Ingress Controller which I use at the moment. The Traefik ingress controller should work as well.
@PKizzle I've rebuilt the raspi kernel to set the correct flags for CONFIG_ARM64_VA_BITS_48
is there a way to verify that this is working? I'm still getting the same issues where the pod fails to start because the cilium-envoy binary isn't present.
First check that the flag was correctly set. Load the configs
kernel module using sudo modprobe configs
.
Then check for the flag: zgrep -E "CONFIG_ARM64_VA_BITS_" /proc/config.gz
.
What error are you exactly encountering?
These are the issues I see in the cluster:
Ah so it seems that It's not loaded the kernel I built...
As you have already figured out the kernel you are using does not have CONFIG_ARM64_VA_BITS_48
set. Once you use the custom compiled one you should no longer encounter the "signal: aborted (core dumped)" error.
@PKizzle For some reason I can't get the pi to start using the new compiled kernel...
Following this: https://charmingwebdesign.com/raspberry-pi-kubernetes-cluster-with-cilium-cni/
I've added kernel=kernel8-48bits.imgf
to /boot/config.txt
and the new modules look like they are present
But I'm still not seeing it enabled when I check the pi after a restart
Use the uname -a
output to check which kernel version you are currently using
Yeah, it;s running the older kernel!
pi@k3s-0:~ $ uname -a
Linux k3s-0 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr 3 17:24:16 BST 2023 aarch64 GNU/Linux
I think it’s a typo. Shouldn’t it be kernel=kernel8-48bits.img
without the „f“ at the end?
God dammit! Yep! Worked a treat, all my nodes booted successfully with the new kernel now!
Describe the bug
I recently opened an issue with Cilium, related to missing kernel modules which allows us to run their L7 proxy module on a Raspberry Pi arm64. In their documentation, developers offer limited guidelines for Ubuntu running on Raspberry Pi.
I was wondering if any of the Raspberry Pi kernel developers can provide additional info, related to the missing modules required for proper Cilium installation.
Steps to reproduce the behaviour
No steps required.
Device (s)
Raspberry Pi 4 Mod. B
System
Logs
No response
Additional context
No response