Open BloodyIron opened 1 week ago
We mirrored the cilium image from the Cilium repo. Cilium-cli should be used when Cilium itself is installed using the cli and not the helmchart. What type of command do you need on the cli? I was able to use the Cilium cli building it from https://github.com/cilium/cilium-cli I just modified the chart name from the code
--- a/defaults/defaults.go
+++ b/defaults/defaults.go
@@ -126,7 +126,7 @@ const (
IngressSecretsNamespace = "cilium-secrets"
// HelmReleaseName is the default Helm release name for Cilium.
- HelmReleaseName = "cilium"
+ HelmReleaseName = "rke2-cilium"
HelmValuesSecretName = "cilium-cli-helm-values"
HelmValuesSecretKeyName = "io.cilium.cilium-cli"
HelmChartVersionSecretKeyName = "io.cilium.chart-version"
We mirrored the cilium image from the Cilium repo. Cilium-cli should be used when Cilium itself is installed using the cli and not the helmchart. What type of command do you need on the cli? I was able to use the Cilium cli building it from https://github.com/cilium/cilium-cli I just modified the chart name from the code
--- a/defaults/defaults.go +++ b/defaults/defaults.go @@ -126,7 +126,7 @@ const ( IngressSecretsNamespace = "cilium-secrets" // HelmReleaseName is the default Helm release name for Cilium. - HelmReleaseName = "cilium" + HelmReleaseName = "rke2-cilium" HelmValuesSecretName = "cilium-cli-helm-values" HelmValuesSecretKeyName = "io.cilium.cilium-cli" HelmChartVersionSecretKeyName = "io.cilium.chart-version"
One quick example is checking clustermesh status, but there's some others too: https://docs.cilium.io/en/stable/operations/troubleshooting/#automatic-verification
The point of me using Rancher and RKE2 is not having to go and recompile code or rehost my own variants of the tooling presented. I don't even know how I would make that kind of a modification without compromising future Rancher/RKE2/related updates, which is a significant concern of mine.
Are you sure that the status command is not working? It should work I can do some tests tomorrow and I'll give you some feedback.
To rephrase what @rbrtbnfgl said
Cilium-cli should only be used when Cilium itself is installed using the cli and not the helmchart.
The cilium status can be read from various CRDs, right?
I don't even see the documentation from Cilium mentioning CRDs for such troubleshooting so I would be going in blind. And yes, I am sure that the command doesnt work @rbrtbnfgl as the cilium command is a symlink to cilium-dbg which has a completely different set of capabilities and commands.
Are you running the cli from the cilium pod? From the Cilium docs you have to install directly from the node with https://docs.cilium.io/en/stable/operations/troubleshooting/#install-the-cilium-cli
I was able to enable the clustermesh and get the right status.
cilium clustermesh status
⚠ Cluster not configured for clustermesh, use '--set cluster.id' and '--set cluster.name' with 'cilium install'. External workloads may still be configured.
⚠ Service type NodePort detected! Service may fail when nodes are removed from the cluster!
✅ Service "clustermesh-apiserver" of type "NodePort" found
✅ Cluster access information is available:
- 10.1.1.11:32379
✅ Deployment clustermesh-apiserver is ready
ℹ KVStoreMesh is disabled
🔌 No cluster connected
🔀 Global services: [ min:-1 / avg:0.0 / max:0 ]
@rbrtbnfgl can you perhaps show an example HelmChartConfig to enable and configure clustermesh via chart values?
Are you running the cli from the cilium pod? From the Cilium docs you have to install directly from the node with https://docs.cilium.io/en/stable/operations/troubleshooting/#install-the-cilium-cli
Yeah I'm not installing software on my nodes that doesn't come from a package manager source. That's just creating future problems I'm not interested in having (namely the package manager not being aware of it and never updating it).
I'm in agreement with @brandond that it seems preferable to have a helmchart config setting to enable this, or I dunno... have it being present by default instead of how it is now. Any chance we can make that happen? (I'd prefer it just be there by default)
Yeah I'm not installing software on my nodes that doesn't come from a package manager
This would be something to raise with the cilium team. We just consume their chart and images; we don't control how they package and distribute the node binaries.
I hope you also realize that the RKE2 and Cilium are both already "installing software on your nodes" by extracting binaries from images and placing them on the root fs, without using the package manager.
it seems preferable to have a helmchart config setting to enable this, or I dunno... have it being present by default
Are you talking about enabling clustermesh by default? I don't think everyone would want that enabled by default. It also requires additional configuration:
Each cluster must be assigned a unique human-readable name as well as a numeric cluster ID (1-255). It is best to assign both these attributes at installation time of Cilium: Helm options cluster.name and cluster.id
Ok I found a way to configure it. It wasn't so easy to configure it through helm.
write-kubeconfig-mode: 644
cluster-cidr: "10.42.0.0/16"
service-cidr: "10.43.0.0/16"
cni: "cilium"
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-cilium
namespace: kube-system
spec:
valuesContent: |-
externalWorkloads:
enabled: true
cluster:
name: cluster1
id: 1
externalWorkloads:
enabled: true
clustermesh:
useAPIServer: true
config:
enabled: true
clusters:
- name: cluster1
ips:
- <ip for the cluster one node>
port: 32379
When the first Cluster starts configure the second cluster with some info from the first.
You need to get the clustermesh apiserver certificate with:
kubectl -n kube-system get secret clustermesh-apiserver-remote-cert -o yaml
Get ca.crt
, tls.crt
and tls.key
from the output
write-kubeconfig-mode: 644
cluster-cidr: "10.44.0.0/16"
service-cidr: "10.45.0.0/16"
cni: "cilium"
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-cilium
namespace: kube-system
spec:
valuesContent: |-
cluster:
name: cluster2
id: 2
externalWorkloads:
enabled: true
clustermesh:
useAPIServer: true
config:
enabled: true
clusters:
- name: cluster2
ips:
- <Ip for cluster2>
port: 32379
- name: cluster1
ips:
- <ip for cluster1>
port: 32379
tls:
cert: "The content of tls.crt from cluster1"
key: "The content of tls.key from cluster1"
caCert: "The content of ca.crt from cluster1"
When also the second cluster started get the same info that were previously taken from the first one with
kubectl -n kube-system get secret clustermesh-apiserver-remote-cert -o yaml
Get ca.crt
, tls.crt
and tls.key
from the output
Add the new info from the second cluster on the cilium config of the first cluster. The new config should looks like this:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: rke2-cilium
namespace: kube-system
spec:
valuesContent: |-
externalWorkloads:
enabled: true
cluster:
name: cluster1
id: 1
externalWorkloads:
enabled: true
clustermesh:
useAPIServer: true
config:
enabled: true
clusters:
- name: cluster1
ips:
- <ip for the cluster one node>
port: 32379
- name: cluster2
ips:
- <ip for cluster2>
port: 32379
tls:
cert: "The content of tls.crt from cluster2"
key: "The content of tls.key from cluster2"
caCert: "The content of ca.crt from cluster2"
sudo service rke2-server restart
The new configuration should be updated I checked the status and it was fine
cilium clustermesh status
⚠ Service type NodePort detected! Service may fail when nodes are removed from the cluster!
✅ Service "clustermesh-apiserver" of type "NodePort" found
✅ Cluster access information is available:
- 10.1.1.11:32379
✅ Deployment clustermesh-apiserver is ready
ℹ KVStoreMesh is disabled
✅ All 1 nodes are connected to all clusters [min:1 / avg:1.0 / max:1]
🔌 Cluster Connections:
- cluster2: 1/1 configured, 1/1 connected
🔀 Global services: [ min:0 / avg:0.0 / max:0 ]
I got this from the first cluster.
That's great, we should add that to the docs @rbrtbnfgl !
Maybe there is a way to generate and add the certificate before so you don't need to restart the nodes but you can configure cilium with the already generated ones.
This isn't about clustermesh specifically, that was simply an example of a function that is exclusive to the cilium cli tool.
And to clarify, I did not mean clustermesh to be on by default, but that the cilium cli binary be present by default, and not a symlink to cilium-dbg as it is currently.
Sorry for any time cost you may have had on this @rbrtbnfgl but yeah I wasn't specifically talking about (only) clustermesh, but the cilium cli tool being present so that it can be used. Namely in scenarios such as the cilium official troubleshooting documentation.
The reason I engaged rke2 on this topic first is it seemed plausible to me (as an outsider) that the way rke2 is implementing cilium "made" this change (not having cilium cli application) since it would be silly for the Cilium team themselves to do that (since it would break a good chunk of the troubleshooting documentation, as we're seeing).
But if there's no way from an rke2 "perspective" to get the cilium cli app existing (as in not a symlink) well I can appeal to the Cilium people (or maybe rke2 team could?). But... are all options exhausted?
There's still plenty for me to learn when it comes to k8s ;)
We don't modify anything on the Cilium image. I think that the image has the Cilium-dbg by design. If you don't want to install anything on the node you could use the Cilium cli by any client it needs only the credentials to the cluster like kubectl.
Environmental Info: RKE2 Version: v1.26.15+rke2r1
The Cilium CNI installed via the helm charts (using Rancher to provision the cluster) produces cluster node pods that do not have the cilium cli application installed at all. Instead there is a symlink to cilium-dbg. The problem with this is there is diagnostic and troubleshooting functionality that I need in the cilium cli tool that is not available in the other available cilium-related commands.
I read the documentation and searched on google, and cannot find a way to get the cilium cli properly installed via this method. So can we please have this corrected? It really makes most of the cilium troubleshooting documentation completely useless since they constantly refer to troubleshooting and functions that are only provided by the proper cilium cli tool.