kubernetes / autoscaler

Autoscaling components for Kubernetes
Apache License 2.0
8.04k stars 3.96k forks source link

Cluster-autoscaller 1.15+ do not work with Magnum #2819

Closed SYezhkov closed 4 years ago

SYezhkov commented 4 years ago

I deploy cluster-autoscaler from image openstackmagnum/cluster-autoscaler:v1.15.2 Kubernetes version 1.15.7 deployed by Openstack (train release) Magnum 9.1.0

From the log: I0211 03:00:39.307214 1 scale_down.go:938] Scale-down: removing empty node k8s-nbs-group-wv7cx2k4h24r-node-1 I0211 03:00:49.185156 1 magnum_manager_heat.go:367] Could not resolve node {Name:k8s-nbs-group-wv7cx2k4h24r-node-1 MachineID:311936e2303b034fe7ef70182235b8cb ProviderID:openstack:///82e3d5e4-a881-4637-82d5-01a27d4c4c74 IPs:[10.1.0.118]} to a stack index E0211 03:00:49.187166 1 scale_down.go:978] Problem with empty node deletion: failed to delete k8s-nbs-group-wv7cx2k4h24r-node-1: manager error deleting nodes: could not find stack indices for nodes to be deleted: 1 nodes could not be resolved to stack indices

As I understand from the source (magnum_manager_heat.go:367) autoscaler takes map IDToIndex from the Heat and try to find Index by MachineID or IP, but map IDToIndex contains OpenStack InstanceID witch not equal MachineID (as we can see from the log).

In my case map IDToIndex looks like output_value: '0': 57cd6509-685c-403d-b69c-f55da617fc5b '1': 82e3d5e4-a881-4637-82d5-01a27d4c4c74 '2': f3535181-94d8-4c9b-8f8d-3b7a27f89452

I think autoscaler have to use SystemUUID or ProviderID to find index, but not MachineID

xuyungit commented 4 years ago

I have the same problem. It's interesting that we have the same MachineID. And all nodes in my cluster have the same machine ID: 311936e2303b034fe7ef70182235b8cb. So I doubt that kubelet container running by podman doesn't get the correct machine ID. Maybe we should mount /etc/machine-id into the container.

openstacker commented 4 years ago

@SYezhkov @xuyungit Thank you very much for reporting this issue. Now I'm working on that.

@SYezhkov Could you please let me know what's the Magnum version you're using and are using podman to bootstrap the k8s? If you're using podman, then I think we probably just need another volume mount at there https://github.com/openstack/magnum/blob/master/magnum/drivers/common/templates/kubernetes/fragments/configure-kubernetes-minion.sh#L100 for /etc/machine-id

I will propose a patch in Magnum soon.

SYezhkov commented 4 years ago

@openstacker Thank you for reply! My Magnum version 9.1.0. And yes I use podman. And as wrote @xuyungit I also have the same MachineID for all kubernetes nodes. I check all nodes with command kubectl describe node XXX.

SYezhkov commented 4 years ago

I locally patch magnum to mount /etc/machine-id to kubelet container and I confirm that this solve the problem

openstacker commented 4 years ago

@SYezhkov Fantastic. Thanks for the confirmation. Patch has been submitted https://review.opendev.org/707336