nashford77 commented 5 months ago

root@5net-k8s-master-0:~# kubectl get nodes -A -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME 5net-k8s-master-0 Ready control-plane,master 23h v1.30.1 10.5.1.36 192.168.5.75 Ubuntu 22.04.4 LTS 5.15.0-112-generic docker://26.1.4 5net-k8s-node-0 Ready worker 23h v1.30.1 10.5.1.121 Ubuntu 22.04.4 LTS 5.15.0-112-generic docker://26.1.4 5net-k8s-node-1 Ready worker 23h v1.30.1 10.5.1.55 Ubuntu 22.04.4 LTS 5.15.0-112-generic docker://26.1.4 5net-k8s-node-2 Ready worker 23h v1.30.1 10.5.1.45 Ubuntu 22.04.4 LTS 5.15.0-112-generic docker://26.1.4

I saw this earlier:

root@5net-k8s-master-0:~# kubectl logs -n kube-system -l k8s-app=openstack-cloud-controller-manager I0608 09:19:59.531119 10 controllermanager.go:319] Starting "service-lb-controller" I0608 09:19:59.531235 10 node_lifecycle_controller.go:113] Sending events to api server I0608 09:19:59.531576 10 openstack.go:385] Claiming to support LoadBalancer I0608 09:19:59.531722 10 controllermanager.go:338] Started "service-lb-controller" I0608 09:19:59.531863 10 controller.go:231] Starting service controller I0608 09:19:59.531964 10 shared_informer.go:313] Waiting for caches to sync for service I0608 09:19:59.631182 10 node_controller.go:425] Initializing node 5net-k8s-master-0 with cloud provider I0608 09:19:59.632722 10 shared_informer.go:320] Caches are synced for service I0608 09:20:00.346484 10 node_controller.go:492] Successfully initialized node 5net-k8s-master-0 with cloud provider I0608 09:20:00.346746 10 event.go:389] "Event occurred" object="5net-k8s-master-0" fieldPath="" kind="Node" apiVersion="v1" type="Normal" reason="Synced" message="Node synced successfully"

I restarted it thinking this may register the other nodes, no go...

root@5net-k8s-master-0:~# I0609 08:52:08.275473 I0609 08:52:08.275485 I0609 08:52:08.275567 I0609 08:52:08.275664 I0609 08:52:08.275676 I0609 08:52:08.275684 I0609 08:52:08.275908 I0609 08:52:08.375986 I0609 08:52:08.376143 I0609 08:52:08.375993 I0609 08:52:23.547603 I0609 08:52:23.550525 I0609 08:52:23.554762 I0609 08:52:23.555992 I0609 08:52:23.558714 I0609 08:52:23.559834 I0609 08:52:23.561899 I0609 08:52:23.562019 I0609 08:52:23.562063 I0609 08:52:23.564807 I0609 08:52:23.565221 I0609 08:52:23.565276 W0609 08:52:23.649048 W0609 08:52:23.649189 W0609 08:52:23.649196 I0609 08:52:23.649203 I0609 08:52:23.651145 I0609 08:52:23.651467 I0609 08:52:23.652264 I0609 08:52:23.665533 kubectl logs -f -n kube-system -l k8s-app=openstack-cloud-controller-manager 10 requestheader_controller.go:169] Starting RequestHeaderAuthRequestController 10 shared_informer.go:313] Waiting for caches to sync for RequestHeaderAuthRequestController 10 tlsconfig.go:240] "Starting DynamicServingCertificateController" 10 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::client-ca-file" 10 shared_informer.go:313] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 10 configmap_cafile_content.go:202] "Starting controller" name="client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file" 10 shared_informer.go:313] Waiting for caches to sync for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file 10 shared_informer.go:320] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::requestheader-client-ca-file 10 shared_informer.go:320] Caches are synced for client-ca::kube-system::extension-apiserver-authentication::client-ca-file 10 shared_informer.go:320] Caches are synced for RequestHeaderAuthRequestController 10 leaderelection.go:260] successfully acquired lease kube-system/cloud-controller-manager 10 event.go:389] "Event occurred" object="kube-system/cloud-controller-manager" fieldPath="" kind="Lease" apiVersion="coordination.k8s.io/v1" type="Normal" reason="LeaderElection" message="5net-k8s-master-0_1c08edda-634c-40aa-b580-12852f2a4bc5 became leader" 10 openstack.go:504] Setting up informers for Cloud 10 controllermanager.go:319] Starting "cloud-node-lifecycle-controller" 10 controllermanager.go:338] Started "cloud-node-lifecycle-controller" 10 controllermanager.go:319] Starting "service-lb-controller" 10 openstack.go:385] Claiming to support LoadBalancer 10 controllermanager.go:338] Started "service-lb-controller" 10 controllermanager.go:319] Starting "node-route-controller" 10 node_lifecycle_controller.go:113] Sending events to api server 10 controller.go:231] Starting service controller 10 shared_informer.go:313] Waiting for caches to sync for service 10 openstack.go:488] Error initialising Routes support: router-id not set in cloud provider config 10 core.go:111] --configure-cloud-routes is set, but cloud provider does not support routes. Will not configure cloud provider routes. 10 controllermanager.go:326] Skipping "node-route-controller" 10 controllermanager.go:319] Starting "cloud-node-controller" 10 controllermanager.go:338] Started "cloud-node-controller" 10 node_controller.go:164] Sending events to api server. 10 node_controller.go:173] Waiting for informer caches to sync 10 shared_informer.go:320] Caches are synced for service

the old versions would register this on all nodes (I have one up with it...)

(kolla-2023.2) root@slurm-primary-controller:~/ansible/5Net/k8s-bootstrap# kubectl get nodes -A -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME k8s-5net-ljtcgza6zsbt-master-0 Ready master 30d v1.23.3 10.5.1.203 192.168.5.72 Fedora CoreOS 38.20230806.3.0 6.4.7-200.fc38.x86_64 docker://20.10.23 k8s-5net-ljtcgza6zsbt-node-0 Ready worker 30d v1.23.3 10.5.1.77 192.168.5.67 Fedora CoreOS 38.20230806.3.0 6.4.7-200.fc38.x86_64 docker://20.10.23 k8s-5net-ljtcgza6zsbt-node-1 Ready worker 30d v1.23.3 10.5.1.174 192.168.5.45 Fedora CoreOS 38.20230806.3.0 6.4.7-200.fc38.x86_64 docker://20.10.23 k8s-5net-ljtcgza6zsbt-node-2 Ready worker 30d v1.23.3 10.5.1.240 192.168.5.87 Fedora CoreOS 38.20230806.3.0 6.4.7-200.fc38.x86_64 docker://20.10.23

What's missing / different in the new version?

adding diagnostic info i can think of to help.

root@5net-k8s-master-0:~# kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE default diagnostic-pod 2/2 Running 0 78m kube-flannel kube-flannel-ds-2mlph 1/1 Running 0 23h kube-flannel kube-flannel-ds-5v6w6 1/1 Running 0 23h kube-flannel kube-flannel-ds-crlks 1/1 Running 0 23h kube-flannel kube-flannel-ds-pg8fb 1/1 Running 1 (24h ago) 24h kube-system coredns-5cf4f94ffd-4px6h 1/1 Running 0 31m kube-system coredns-5cf4f94ffd-j9phr 1/1 Running 0 31m kube-system dnsutils 1/1 Running 1 (12m ago) 72m kube-system etcd-5net-k8s-master-0 1/1 Running 1 (24h ago) 24h kube-system kube-apiserver-5net-k8s-master-0 1/1 Running 1 (23h ago) 24h kube-system kube-controller-manager-5net-k8s-master-0 1/1 Running 1 (24h ago) 24h kube-system kube-proxy-5b279 1/1 Running 0 23h kube-system kube-proxy-5l2cc 1/1 Running 1 (24h ago) 24h kube-system kube-proxy-bdz4v 1/1 Running 0 23h kube-system kube-proxy-lfrsz 1/1 Running 0 23h kube-system kube-scheduler-5net-k8s-master-0 1/1 Running 1 (24h ago) 24h kube-system openstack-cloud-controller-manager-crnlz 1/1 Running 0 7m15s

root@5net-k8s-master-0:~# kubectl get ds openstack-cloud-controller-manager -n kube-system NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE openstack-cloud-controller-manager 1 1 1 1 1 node-role.kubernetes.io/control-plane= 23h

Please edit the object below. Lines beginning with a '#' will be ignored,

and an empty file will abort the edit. If an error occurs while saving this file will be

reopened with the relevant failures.

# apiVersion: apps/v1 kind: DaemonSet metadata: annotations: deprecated.daemonset.template.generation: "4" kubectl.kubernetes.io/last-applied-configuration: | {"apiVersion":"apps/v1","kind":"DaemonSet","metadata":{"annotations":{},"labels":{"k8s-app":"openstack-cloud-controller-manager"},"name":"openstack-cloud-controller-manager","namespace":"kube-system"},"spec":{"selector":{"matchLabels":{"k8s-app":"openstack-cloud-controller-manager"}},"template":{"metadata":{"labels":{"k8s-app":"openstack-cloud-controller-manager"}},"spec":{"containers":[{"args":["/bin/openstack-cloud-controller-manager","--v=1","--cluster-name=$(CLUSTER_NAME)","--cloud-config=$(CLOUD_CONFIG)","--cloud-provider=openstack","--use-service-account-credentials=false","--bind-address=127.0.0.1"],"env":[{"name":"CLOUD_CONFIG","value":"/etc/config/cloud.conf"},{"name":"CLUSTER_NAME","value":"kubernetes"}],"image":"registry.k8s.io/provider-os/openstack-cloud-controller-manager:v1.30.0","name":"openstack-cloud-controller-manager","resources":{"requests":{"cpu":"200m"}},"volumeMounts":[{"mountPath":"/etc/kubernetes/pki","name":"k8s-certs","readOnly":true},{"mountPath":"/etc/ssl/certs","name":"ca-certs","readOnly":true},{"mountPath":"/etc/config","name":"cloud-config-volume","readOnly":true}]}],"dnsPolicy":"ClusterFirstWithHostNet","hostNetwork":true,"nodeSelector":{"node-role.kubernetes.io/control-plane":""},"securityContext":{"runAsUser":1001},"serviceAccountName":"cloud-controller-manager","tolerations":[{"effect":"NoSchedule","key":"node.cloudprovider.kubernetes.io/uninitialized","value":"true"},{"effect":"NoSchedule","key":"node-role.kubernetes.io/master"},{"effect":"NoSchedule","key":"node-role.kubernetes.io/control-plane"}],"volumes":[{"hostPath":{"path":"/etc/kubernetes/pki","type":"DirectoryOrCreate"},"name":"k8s-certs"},{"hostPath":{"path":"/etc/ssl/certs","type":"DirectoryOrCreate"},"name":"ca-certs"},{"name":"cloud-config-volume","secret":{"secretName":"cloud-config"}}]}},"updateStrategy":{"type":"RollingUpdate"}}} creationTimestamp: "2024-06-08T08:58:54Z" generation: 4 labels: k8s-app: openstack-cloud-controller-manager name: openstack-cloud-controller-manager namespace: kube-system resourceVersion: "182284" uid: 8006e824-3ea2-44b4-8a0b-335777a86009 spec: revisionHistoryLimit: 10 selector: matchLabels: k8s-app: openstack-cloud-controller-manager template: metadata: annotations: kubectl.kubernetes.io/restartedAt: "2024-06-09T08:52:05Z" creationTimestamp: null labels: k8s-app: openstack-cloud-controller-manager spec: containers:

args:
- /bin/openstack-cloud-controller-manager
- --v=1
- --cluster-name=$(CLUSTER_NAME)
- --cloud-config=$(CLOUD_CONFIG)
- --cloud-provider=openstack
- --use-service-account-credentials=false
- --bind-address=127.0.0.1
- --feature-gates=CloudDualStackNodeIPs=true env:
- name: CLOUD_CONFIG value: /etc/config/cloud.conf
- name: CLUSTER_NAME value: kubernetes image: registry.k8s.io/provider-os/openstack-cloud-controller-manager:v1.30.0 imagePullPolicy: IfNotPresent name: openstack-cloud-controller-manager resources: requests: cpu: 200m terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts:
- mountPath: /etc/kubernetes/pki name: k8s-certs readOnly: true
- mountPath: /etc/ssl/certs name: ca-certs readOnly: true
- mountPath: /etc/config name: cloud-config-volume readOnly: true dnsPolicy: ClusterFirstWithHostNet hostNetwork: true nodeSelector: node-role.kubernetes.io/control-plane: "" restartPolicy: Always schedulerName: default-scheduler securityContext: runAsUser: 1001 serviceAccount: cloud-controller-manager serviceAccountName: cloud-controller-manager terminationGracePeriodSeconds: 30 tolerations:
effect: NoSchedule key: node.cloudprovider.kubernetes.io/uninitialized value: "true"
effect: NoSchedule key: node-role.kubernetes.io/master
effect: NoSchedule key: node-role.kubernetes.io/control-plane volumes:
hostPath: path: /etc/kubernetes/pki type: DirectoryOrCreate name: k8s-certs
hostPath: path: /etc/ssl/certs type: DirectoryOrCreate name: ca-certs
name: cloud-config-volume secret: defaultMode: 420 secretName: cloud-config updateStrategy: rollingUpdate: maxSurge: 0 maxUnavailable: 1 type: RollingUpdate status: currentNumberScheduled: 1 desiredNumberScheduled: 1 numberAvailable: 1 numberMisscheduled: 0 numberReady: 1 observedGeneration: 4 updatedNumberScheduled: 1

Are the tolerations the issue? It should only run on master nodes, but it should still pull info for the worker nodes external IP's ?!

nashford77 commented 5 months ago

Q: How are you meant to bootstrap the worker nodes ? is there an example? guessing kubelet args are missing ... ?

kundan2707 commented 5 months ago

/kind support

jichenjc commented 5 months ago

not sure I fully understand the question here

are you saying the nodes you created for worker node doesn't have the external ip (which is from floating ip) ?

nashford77 commented 5 months ago

Yes, root issue was that it was not bootstrapped on the worker node side correctly with "external" for the cloud provider key & was a cloud init issue on my side. All sorted now

On Tue, Jun 18, 2024, 3:06 AM ji chen @.***> wrote:

not sure I fully understand the question here

are you saying the nodes you created for worker node doesn't have the external ip (which is from floating ip) ?

— Reply to this email directly, view it on GitHub https://github.com/kubernetes/cloud-provider-openstack/issues/2617#issuecomment-2175318484, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFSSJU3Y7S3LDPW5CJHI6NTZH7L7TAVCNFSM6AAAAABJAWZ62KVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNZVGMYTQNBYGQ . You are receiving this because you authored the thread.Message ID: @.***>

jichenjc commented 5 months ago

ok, please close this if all done, thanks

k8s-triage-robot commented 2 months ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle stale
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 1 month ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue as fresh with /remove-lifecycle rotten
Close this issue with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 weeks ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue with /reopen
Mark this issue as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close not-planned

k8s-ci-robot commented 2 weeks ago

@k8s-triage-robot: Closing this issue, marking it as "Not Planned".

In response to [this](https://github.com/kubernetes/cloud-provider-openstack/issues/2617#issuecomment-2478448095): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue with `/reopen` >- Mark this issue as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close not-planned > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository.

kubernetes / cloud-provider-openstack

Only a Master Node is Getting EXTERNAL-IP Not worker nodes? #2617

Please edit the object below. Lines beginning with a '#' will be ignored,

and an empty file will abort the edit. If an error occurs while saving this file will be

reopened with the relevant failures.