Closed bradbeam closed 6 years ago
Should kubespray stop defining hostname in /etc/cni/net.d/10-calico.conf?
@mattymo It should use nodename
as we deprecated hostname
to better align with what we are looking for, and maybe should use ansible_nodename
or ansible_fqdn
? I'm not great with Ansible so I might be off target there.
I think there's not a lot we can do today given the orthogonal nature of the two components besides make our documentation much better.
Note that we already set both the CNI and calico/node nodename configuration options in the KDD manifest using the k8s downward API.
This is probably the direction we want to point people since it guarantees a consistent value, although changing that on a running cluster has potential to cause some issues so I don't know that we can just update our manifests across-the-board to do this.
It should use nodename as we deprecated hostname
@heschlie I'd wait until https://github.com/projectcalico/cni-plugin/pull/375 is merged and released though if you're using calico IPAM!
We gave full stack cni/node Calico a try yesterday. We got hit hard by this.
At my scenario, I found out that:
(my lab setup is 2.6.5 with calico node as DS from yaml provided, cni installed from secondary install-cni container).
Although I noticed its listed pretty clear here that if calico-node registers itself with something other than hostname you are going to have a bad time, I couldn't find out anywhere in the documentation that the default behaviour from calico-node is to register with IP.
If you folks find appropriate, I'd suggest to put a note in big bold letters on both calico-node/cni documentation stating what is going to happen if you don't define your nodenames manually on both places.
- Calico-node default behaviour is to register the calico host with its ip address if NODENAME is not set.
- Calico cni default behaviour is to register the calico host with its FQDN hostname if NODENAME is not set
That's interesting, and not what I'd have expected. I'd expect both to use the same logic, defaulting to the value provided by os.Hostname()
. Looking at the code though, the two bits are unfortunately not identical...
It's a shame this logic isn't common between the two. The main difference seems to be this bit of code where the HOSTNAME
env var is checked. I wonder if that is the root of the discrepancy you were seeing, @mrrandrade .
we hit this problem too.
I'd like to change the manfiests to that these are always the same:
In CNI config:
"nodename": "__KUBERNETES_NODE_NAME__",
In the DaemonSet:
# Set based on the k8s node name.
- name: NODENAME
valueFrom:
fieldRef:
fieldPath: spec.nodeName
However, if the chosen value differs from the value that was previously autodetected, this will break things on upgrade. So, we'd need to find a way to support upgrade to the new manfiest.
As it is, I can't really think of anything workable - making this sort of change is going to require downtime on any cluster performing an upgrade. Any ideas?
This one was not very fun to discover. It took me two days reading documentation, debugging and a lot of frustration.
When it's a mismatch between CNI nodename and calico-node nodename the interface is created but it gets no ip address assigned.
I did not define any nodename in CNI configuration but it was defined in my calico-node configuration.
Once I added nodename config to CNI my cluster started work work.
I'm working on a fix for this.
I've got two PRs:
Together, those PRs will let the two components coordinate on a node name to use.
Ran into an issue where I had a hostname mismatch between the cni config and calico-node hostname variable. This caused calico networking to appear functional but not actually work. Worked with @tmjd to get this identified and resolved.
cni config referenced the short name [1] and the calico-node nodename parameter was obtained via kubernetes downward api [2]. This caused an issue where
calicoctl get nodes
reported back both the fqdn and short name for the node [3].[1]
[2]
[3]
Expected Behavior
Calico networking for pods works.
Current Behavior
Current behavior allows for you to have mismatching names with calico-node running happily and kubelet / cni happy as well, while in fact networking is not working as expected.
Possible Solution
It'd be swell if there was a way to unify the way the node names are defined. Some ideas:
Steps to Reproduce (for bugs)
1. 2. 3. 4.
Context
Your Environment