wbuchwalter / Kubernetes-acs-engine-autoscaler

[Deprecated] Node-level autoscaler for Kubernetes clusters created with acs-engine.
Other
71 stars 22 forks source link

ERROR - Unexpected error: <class 'ValueError'>, Could not find the virtualMachines/extensions resource for the specified agent pool #92

Open nagarjunac opened 6 years ago

nagarjunac commented 6 years ago

Hi , I have deployed autoscaler . I am facing the below issue when the autoscaler is trying to scale up the nodes.

2018-05-08 13:05:49,055 - autoscaler.cluster - DEBUG - Using kube service account 2018-05-08 13:05:49,055 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++ 2018-05-08 13:05:49,152 - autoscaler.cluster - INFO - Pods to schedule: 6 2018-05-08 13:05:49,152 - autoscaler.cluster - INFO - ++++ Scaling Up Begins ++++++ 2018-05-08 13:05:49,152 - autoscaler.cluster - INFO - Nodes: 3 2018-05-08 13:05:49,152 - autoscaler.cluster - INFO - To schedule: 6 2018-05-08 13:05:49,153 - autoscaler.cluster - INFO - KubePod(default, nginx-ingress-public-controller-56d64cc5f4-8zc8s) fits on k8s-tier2-25950476-0 2018-05-08 13:05:49,154 - autoscaler.cluster - INFO - KubePod(default, nginx-ingress-public-controller-56d64cc5f4-b2jh6) fits on k8s-tier2-25950476-0 2018-05-08 13:05:49,154 - autoscaler.cluster - INFO - KubePod(default, nginx-ingress-public-controller-56d64cc5f4-c2qkm) fits on k8s-tier2-25950476-0 2018-05-08 13:05:49,154 - autoscaler.cluster - INFO - Pending pods: 3 2018-05-08 13:05:49,154 - autoscaler.cluster - DEBUG - nginx-ingress-public-controller-56d64cc5f4-dfftr 2018-05-08 13:05:49,155 - autoscaler.cluster - DEBUG - nginx-ingress-public-controller-56d64cc5f4-q25zp 2018-05-08 13:05:49,155 - autoscaler.cluster - DEBUG - nginx-ingress-public-controller-56d64cc5f4-tgp8v 2018-05-08 13:05:49,155 - autoscaler.scaler - INFO - ====Scaling for 3 pods ==== 2018-05-08 13:05:49,155 - autoscaler.scaler - DEBUG - units_needed: 1 2018-05-08 13:05:49,155 - autoscaler.scaler - DEBUG - units_requested: 1 2018-05-08 13:05:49,156 - autoscaler.scaler - DEBUG - tier1 actual capacity: 2 , units requested: 1 2018-05-08 13:05:49,156 - autoscaler.scaler - INFO - New capacity requested for pool tier1: 3 agents (current capacity: 2 agents) 2018-05-08 13:05:49,156 - autoscaler.scaler - DEBUG - remaining pending: 0 2018-05-08 13:05:49,167 - autoscaler.cluster - ERROR - Unexpected error: <class 'ValueError'>, Could not find the virtualMachines/extensions resource for the specified agent pool 2018-05-08 13:05:49,168 - autoscaler - WARNING - backoff: 60

rbankole commented 6 years ago

fwiwi saw this error when i mistakenly deployed into another cluster without changing the deployment name.

MirzaSikander commented 6 years ago

Seeing the same error. Confirmed that the deployment name matches.

grawcho commented 6 years ago

the naming convention for template in latest acs-engine version changed from: [concat(variables('{}VMNamePrefix'), copyIndex(variables('{}Offset')),'/cse', copyIndex(variables('{}Offset')))] to [concat(variables('{}VMNamePrefix'), copyIndex(variables('{}Offset')),'/cse', '-agent-', copyIndex(variables('{}Offset')))]

therefore in line 50 in template_processing.py you should change the ext_resource_name and node_ext_template in line 69 to support this and run the proper extension on deployment. good luck

grawcho commented 6 years ago

now after the scale up deployments actually work there is a problem i belive to be related to azure CNI in k8s 1.10 - the new nodes are not enrolled to the cluster ... so the original virtualMachines/extensions error is fixed but there is still an issue. seems that the cse's run OK deployment completes but kubectl get nodes returns no new nodes. ideas anyone?

grawcho commented 6 years ago

OK, got it to work there was a problem with the master extension deletion to... now this should be resolved on my fork ... will open a pull request.

Nikasa1889 commented 6 years ago

I've got the same issue and waiting for the merge!