Agent error: No relationship found between node and this object

ambis commented 4 years ago

Version: k3s version v1.17.4+k3s1 (3eee8ac3)

K3s arguments:

export INSTALL_K3S_VERSION="v1.17.4+k3s1"
export INSTALL_K3S_EXEC="server \
  --token=***** \
  --node-taint=k3s-controlplane=true:NoExecute \
  --flannel-backend=none \
  --no-deploy=traefik,servicelb \
  --disable-network-policy \
  --disable-cloud-controller \
"

And for the agent (values filled by ansible):

export INSTALL_K3S_EXEC="agent \
  --server=https://{{ kubemaster01_ip }}:6443 \
  --token={{ cluster01_join_token }} \
"

(I install weave as CNI network provider (IPALLOC_RANGE=10.42.0.0/16 + ipsec), which seems to work just fine.)

Describe the bug When installing eg.

kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml

I get a ton of (which is the main issue here):

E0505 16:10:07.813209       6 reflector.go:153] object-"longhorn-system"/"longhorn-service-account-token-wgkxm": Failed to list *v1.Secret: secrets "longhorn-service-account-token-wgkxm" is forbidden: User "system:node:k3d-cluster-worker-0" cannot list resource "secrets" in API group "" in the namespace "longhorn-system": no relationship found between node "k3d-cluster-worker-0" and this object

To Reproduce This can be easily reproduced locally with k3d, with the exact same effect:

k3d -v 
# k3d version v1.7.0
k3d create --name cluster --workers 1
export KUBECONFIG="$(k3d get-kubeconfig --name='cluster')"
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/master/deploy/longhorn.yaml
docker logs -f k3d-cluster-worker-0

Expected behavior Just to not have these errors.

Actual behavior I get mentioned (no relationship found between node) errors. The pods themselves seem to work OK, but the agent log is filled with those errors.

Additional context / logs While my example and way to reproduce here uses k3d, my original environment is Debian 10 VPS servers on Hetzner Cloud. The k3s versions and errors are identical between the environments.

I have managed to find out this has probably something to do with Node Authorization. But I cannot find information about how to debug this. I've made sure that each node registers with the same FQDN to the master as is their DNS-visible+accessible FQDN, which is also the /etc/hostname.

When describing the nodes in the cluster, their names are the proper FQDNs.

ambis commented 4 years ago

I was a complete dumdum, I had not updated the DNS addresses for the new nodes I had created... The error has now changed after they resolve properly:

E0505 19:36:04.393163   18164 reflector.go:153] object-"longhorn-system"/"": Failed to list *v1.Secret: secrets is forbidden: User "system:node:(my-node-name)" cannot list resource "secrets" in API group "" in the namespace "longhorn-system": No Object name found

Which may or may not be k3s issue? Probably not, so I'll close this.

Akay7 commented 4 years ago

@ambis did you solve this issue?

ambis commented 4 years ago

@ambis did you solve this issue?

Yes. Like I mentioned above, the issue was that the DNS for the nodes was incorrect (I tought I had fixed it but hadn't).

Once I updated the node IP's to DNS, this problem went away.

Edit: Also the longhorn bug seems to be caused by the longhorm yamls which refer to an empty secret name.

k3s-io / k3s

Agent error: No relationship found between node and this object #1747