Closed jfpucheu closed 2 years ago
topology.cinder.csi.openstack.org/zone=eu-west-0a is provided at Cinder CSI , so deploy OCCM should not help https://github.com/kubernetes/cloud-provider-openstack/blob/master/docs/cinder-csi-plugin/features.md#topology
I assume you also want to have topology.kubernetes.io/region=eu-west-0 label it should comes from https://github.com/kubernetes/cloud-provider-openstack/blob/master/pkg/openstack/openstack.go#L366 I am not sure your configuration file contains such region definition? or maybe you can check whether you can see the log and the info in it (update to log level 4 or higher)
Hello,
I have the region setup:
[Global]
username=********
password=********
auth-url=********
tenant-id=******
domain-id=********
region=eu-west-0
[LoadBalancer]
enabled=false
create-monitor = no
[Metadata]
search-order = metadataService,configDrive
No logs with --v=6 : kubectl logs openstack-cloud-controller-manager-tm6vg -n kube-system | grep "Current zone" (nothing)
it seems the function is never called.
it seems the function is never called.
weird, need check my env soon, thanks~
I have this in my dev env
topology.cinder.csi.openstack.org/zone=nova
topology.kubernetes.io/region=RegionOne
topology.kubernetes.io/zone=nova
then I unlabeled
kubectl label node n1 topology.kubernetes.io/region-
kubectl label node n1 topology.kubernetes.io/zone-
and after a while I can see the label added again
$ kubectl describe node
Name: n1
Roles: control-plane,master
......
topology.cinder.csi.openstack.org/zone=nova
topology.kubernetes.io/region=RegionOne
topology.kubernetes.io/zone=nova
at I find this
ubuntu@n1:~$ kubectl logs openstack-cloud-controller-manager-7sknr -n kube-system | grep label
I0216 03:33:06.619486 1 labels.go:56] Updated labels map[topology.kubernetes.io/region:RegionOne topology.kubernetes.io/zone:nova] to Node n1
can you check whether you have this log in your env? thanks
Hello,
still not have any log about that.
is it possible that i don't get metadata because i don't have external load balancer supported ?
I0216 07:39:44.056331 1 openstack.go:310] openstack.LoadBalancer() called
E0216 07:39:44.056364 1 openstack.go:326] Failed to create an OpenStack LoadBalancer client: failed to find load-balancer v2 endpoint for region eu-west-0: No suitable endpoint could be found in the
service catalog.
E0216 07:39:44.056381 1 core.go:93] Failed to start service controller: the cloud provider does not support external load balancers
W0216 07:39:44.056390 1 controllermanager.go:286] Skipping "service"
Thanks jeff
the above error should not impact as it only tells no LB service defined and you are not able to create LB service ,but it should not impact OCCM running
https://github.com/kubernetes/cloud-provider/blob/master/controllers/node/node_controller.go#L270 is the code that set labels from https://github.com/kubernetes/cloud-provider/blob/master/controllers/node/node_controller.go#L53
You said you are using 1.23 so should be up to date already, I have no idea why the reconcile not working maybe consider to add some logs and do some debug will be helpful here or @lingxiankong @ramineni might know more
Restart the openstack-cloud-controller-manager with --v=6
, and search the log starting with Initializing node
and Successfully initialized node
, please paste all the logs in between.
Hello,
After futher investigation, I found the issue is link to this code on cloud-provider/node_controller.go (https://github.com/kubernetes/cloud-provider/blob/master/controllers/node/node_controller.go -> l. 384)
cloudTaint := getCloudTaint(curNode.Spec.Taints)
if cloudTaint == nil {
klog.Info("LOG MORE - err syncNode cloudTaint")
// Node object received from event had the cloud taint but was outdated,
// the node has actually already been initialized, so this sync event can be ignored.
return nil
}
I added more log and I can see the log :
I0217 15:12:35.047625 1 instances.go:156] NodeAddressesByProviderID(openstack:///004f593d-7291-40f4-9075-eedcfa25f2c1) => [{InternalIP xx.xx.xx.xx}]
I0217 15:12:35.219042 1 node_controller.go:412] LOG MORE - err syncNode cloudTaint
I0217 15:12:36.317768 1 round_trippers.go:553] GET https://10.254.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cloud-controller-manager?timeout=5s 200 OK in 5 milliseconds
I0217 15:12:36.325470 1 round_trippers.go:553] PUT https://10.254.0.1:443/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cloud-controller-manager?timeout=5s 200 OK in 7 milliseconds
I0217 15:12:36.325568 1 leaderelection.go:278] successfully renewed lease kube-system/cloud-controller-manager
If I comment those lines I have no more issue and the initialization works :
I0217 15:06:23.189333 1 node_controller.go:419] Initializing node kdevnodeaz0a01 with cloud provider
I0217 15:06:24.051665 1 node_controller.go:522] Adding node label from cloud provider: beta.kubernetes.io/instance-type=m2.xlarge.8
I0217 15:06:24.051675 1 node_controller.go:523] Adding node label from cloud provider: node.kubernetes.io/instance-type=m2.xlarge.8
I0217 15:06:24.051683 1 node_controller.go:534] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/zone=eu-west-0a
I0217 15:06:24.051693 1 node_controller.go:535] Adding node label from cloud provider: topology.kubernetes.io/zone=eu-west-0a
I0217 15:06:24.051700 1 node_controller.go:545] Adding node label from cloud provider: failure-domain.beta.kubernetes.io/region=eu-west-0
I0217 15:06:24.051705 1 node_controller.go:546] Adding node label from cloud provider: topology.kubernetes.io/region=eu-west-0
I0217 15:06:24.065453 1 node_controller.go:484] Successfully initialized node kdevnodeaz0a01 with cloud provider
But I don't understand why we are on this failed case...
Do you have any idea ?
To complete my previous comment, once the labels have been added, if I delete them, there are automaticaly added like @jichenjc mentioned before.
The code you pointed out actually use https://github.com/kubernetes/cloud-provider/blob/0429a85a45b2424c1508ea289fea6d1e8f15d30f/api/well_known_taints.go#L24
which means if the node is not initialized , then it will be init since the node is actually inited already (that's why the node taint is None in my env)
$ kubectl get nodes -A
.....
Taints:
so I doubt whether it's the root cause as per my test env I delete that label then it's able to recreate again, for now, without the removal of the code you mentioned, can you remove the lable and will the OCCM create for you?
To be clear, it's mean you can't deploy the OCCM on the cluster already existing ? My nodes are already created. This is why the init is not done...
not sure, I never used in-tree to external cloud provider (always use external directly)
there is a video created by @lingxiankong , maybe he has more info
The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle rotten
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten
The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.
This bot triages issues and PRs according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/reopen
/remove-lifecycle rotten
Please send feedback to sig-contributor-experience at kubernetes/community.
/close
@k8s-triage-robot: Closing this issue.
/kind bug
What happened:
Before with cloud provider openstack in kubelet my nodes where labeled with region and zone like this:
Migrated to occm I only got label:
topology.cinder.csi.openstack.org/zone=eu-west-0a
metadatas are not empty:
What you expected to happen:
have node labeled with topology from openstack apis like:
How to reproduce it:
deploy OCCM with basic config ( no changes ) and cinder csi check node label using : kubectl describe node mynode
Anything else we need to know?: I don't now if it it really an issue or a feature request because this part is not very documented but this feature was very convenient in kubelet cloud controllers. All nodes topology what labeled automaticaly. My OCCM don't have issue to contact Openstack api, I saw responses in logs:
Environment:
Thanks for the help Jeff