kubernetes-sigs / kubespray

Deploy a Production Ready Kubernetes Cluster
Apache License 2.0
16.13k stars 6.47k forks source link

Tolerations are not getting applied #11107

Closed sanshah1211 closed 1 month ago

sanshah1211 commented 6 months ago

Trying to integrate the openstack cloud controller manager using kubespray but it's not applying toleration mentioned "node.cloudprovider.kubernetes.io/uninitialized" defined in external-openstack-cloud-controller-manager-ds.yml

Here are my environment details. Kubespray Version: v1.28.6 Open Stack Version: Bobcat/2023.2

Any suggestions how to resolve this issue ?

tico88612 commented 6 months ago

Hi @sanshah1211

I have also encountered this issue, which I confirmed with @yankay earlier because some modules (e.g., OpenStack Cloud Controller Manager, etc.) lack maintainers and are not current. I will work on fixing this issue.

If you have any contributions or suggestions, please don't hesitate to send them here.

sanshah1211 commented 6 months ago

@tico88612 thank you

sanshah1211 commented 6 months ago

Even I have tried to deploy it (referring https://kubernetes.io/blog/2020/02/07/deploying-external-openstack-cloud-provider-with-kubeadm/) but still having the same issues.

tico88612 commented 6 months ago

/assign

sanshah1211 commented 4 months ago

@tico88612 any updates ?

tico88612 commented 4 months ago

@sanshah1211, could you try Kubespray 2.25.0?

sanshah1211 commented 4 months ago

I am using master branch you want me to use 2.25.0 and try again ?

tico88612 commented 4 months ago

I think the current master branch should work, at least I've tried it myself in my environment.

sanshah1211 commented 4 months ago

OK, let me pull the latest changes and try again, one thing I observed if I am not using openstack external cloud controller manager tolerations are getting applied properly, only having this issue when I am implementing openstack cloud controller manager

tico88612 commented 4 months ago

@sanshah1211 could you provide your cloud_provider and external_cloud_provider in group_vars/all/all.yml?

This is my environment setting:

cloud_provider: 'external'
external_cloud_provider: 'openstack'
sanshah1211 commented 4 months ago

@tico88612

[stack@deployment ~]$ cat kubespray/inventory/k8s/group_vars/all/all.yml  | grep cloud_provider
cloud_provider: external
## When cloud_provider is set to 'external', you can set the cloud controller to deploy
external_cloud_provider: openstack
[stack@deployment ~]$
sanshah1211 commented 4 months ago

@tico88612 here is the output of kubectl get pods -A image

image

Pods are showing in pending state, each time I need to manually apply taint to get it in running state

tico88612 commented 4 months ago

Unfortunately, I have also faced this problem. It currently needs to be manually untainted. (I'm guessing this is a problem with my OpenStack and K8s integration settings, not Kubespray.) You can change --v=1 to --v=5 for details.

image
sanshah1211 commented 4 months ago

@tico88612 I have tried with manual integration method and also with kubespray same issue is there, tolerations are not working. If I am only deploying k8s using kubespray tolerations are working fine

sanshah1211 commented 4 months ago

@tico88612 when I checked the logs I can see below error messages in cloud-controller-manager logs

E0619 12:24:19.643045      11 node_controller.go:240] error syncing '10.10.0.127': failed to get instance metadata for node 10.10.0.127: instance not found, requeuing
I0619 12:24:20.902137      11 leaderelection.go:281] successfully renewed lease kube-system/cloud-controller-manager
I0619 12:24:22.913751      11 leaderelection.go:281] successfully renewed lease kube-system/cloud-controller-manager
I0619 12:24:24.924495      11 leaderelection.go:281] successfully renewed lease kube-system/cloud-controller-manager
I0619 12:24:26.571122      11 discovery.go:214] Invalidating discovery information
I0619 12:24:26.693818      11 reflector.go:378] k8s.io/client-go@v0.28.4/tools/cache/reflector.go:229: forcing resync
I0619 12:24:26.936339      11 leaderelection.go:281] successfully renewed lease kube-system/cloud-controller-manager
I0619 12:24:28.192508      11 node_controller.go:431] Initializing node 10.10.0.126 with cloud provider
I0619 12:24:28.192561      11 instancesv2.go:52] openstack.Instancesv2() called
I0619 12:24:28.192605      11 instancesv2.go:52] openstack.Instancesv2() called
E0619 12:24:28.264325      11 node_controller.go:240] error syncing '10.10.0.126': failed to get instance metadata for node 10.10.0.126: instance not found, requeuing
I0619 12:24:28.404387      11 node_controller.go:431] Initializing node 10.10.0.128 with cloud provider
I0619 12:24:28.404448      11 instancesv2.go:52] openstack.Instancesv2() called
I0619 12:24:28.404476      11 instancesv2.go:52] openstack.Instancesv2() called
E0619 12:24:28.481561      11 node_controller.go:240] error syncing '10.10.0.128': failed to get instance metadata for node 10.10.0.128: instance not found, requeuing
I0619 12:24:28.947131      11 leaderelection.go:281] successfully renewed lease kube-system/cloud-controller-manager
error: read tcp 10.10.0.125:45414->10.10.0.209:6443: read: connection reset by peer
[rocky@10 ~]$

Do you think this is the cause ?

tico88612 commented 4 months ago

I have no idea, but I asked this in Slack.

https://kubernetes.slack.com/archives/C0LSA3T7C/p1715079276968059

sanshah1211 commented 4 months ago

@tico88612 just fyi, I have tried same thing with integrating K8s and Open Stack using clusterapi and the only difference what I see is in node description I can see provider id like below but with kubespray and openstack integration I don't see any provider id mentioned for any k8s node like this.

PodCIDR:                     10.224.0.0/24
ProviderID:                  openstack:///548e3c46-2477-4ce2-968b-3de1314560a5

Is it the reason that tolerations are not getting applied ?

sanshah1211 commented 4 months ago

@tico88612 Here are some more logs of cloud controller manager

image

tico88612 commented 4 months ago

@sanshah1211 My k8s problem is that the hostname does not match the OpenStack instance name, and it has been resolved.

I think your problem is similar to mine. You can try this. https://kubernetes.slack.com/archives/C0LSA3T7C/p1718804970684759?thread_ts=1715079276.968059&cid=C0LSA3T7C

sanshah1211 commented 4 months ago

@tico88612 I can't open it as I don't have permission to kubernetes workspace on slack, how to get an access to check any idea ?

tico88612 commented 4 months ago

@sanshah1211 http://slack.k8s.io/

sanshah1211 commented 4 months ago

@tico88612 thank you for the link.

Is it perfectly fine if I use octavia for Kubernetes HA instead of HAProxy or Nginx ?

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

tico88612 commented 1 month ago

/close

k8s-ci-robot commented 1 month ago

@tico88612: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/kubespray/issues/11107#issuecomment-2362484865): >/close Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.