wbuchwalter / Kubernetes-acs-engine-autoscaler

[Deprecated] Node-level autoscaler for Kubernetes clusters created with acs-engine.
Other
71 stars 22 forks source link

kubectl doesn't see newly add nodes? #90

Closed sylus closed 6 years ago

sylus commented 6 years ago

Hi there @wbuchwalter thanks for the great project!

Reproduction Steps

  1. Deploy Cluster (1 master, 1 node)
az group deployment create --name "acs-default" \
                           --resource-group "acs-default" \
                           --template-file "./_output/k8s-acs-default/azuredeploy.json" \
                           --parameters "./_output/k8s-acs-default/azuredeploy.parameters.json"
  1. Deploy Hadoop (enough to fit on 1 node)
helm install --name hadoop -f values.yaml . --namespace hadoop
  1. Deploy ACS Auto Scaler (already filled proper values)
helm install --name acs-engine-autoscaler  -f values.yaml .
  1. RBAC Quick Fix
kubectl create clusterrolebinding serviceaccounts-cluster-admin --clusterrole=cluster-admin --group=system:serviceaccounts
  1. ACS Auto Scaler logs before increase Hadoop specs
kubectl logs $(kubectl --namespace=default get pods -l "app=acs-engine-autoscaler" -o jsonpath="{.items[0].metadata.name}")
2018-04-17 17:52:30,866 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-04-17 17:52:30,901 - autoscaler.cluster - INFO - Pods to schedule: 0
2018-04-17 17:52:30,901 - autoscaler.cluster - INFO - ++++ Scaling Up Begins ++++++
2018-04-17 17:52:30,901 - autoscaler.cluster - INFO - Nodes: 1
2018-04-17 17:52:30,901 - autoscaler.cluster - INFO - To schedule: 0
2018-04-17 17:52:30,901 - autoscaler.cluster - INFO - Pending pods: 0
2018-04-17 17:52:30,901 - autoscaler.cluster - INFO - ++++ Scaling Up Ends ++++++
2018-04-17 17:52:30,901 - autoscaler.cluster - INFO - ++++ Maintenance Begins ++++++
2018-04-17 17:52:30,901 - autoscaler.engine_scaler - INFO - ++++ Maintaining Nodes ++++++
2018-04-17 17:52:30,902 - autoscaler.engine_scaler - INFO - node: k8s-linuxpool1-42410247-0                                                   state: busy
2018-04-17 17:52:30,902 - autoscaler.cluster - INFO - ++++ Maintenance Ends ++++++
  1. Update Hadoop number of Yarn Nodes
helm upgrade hadoop --set yarn.nodeManager.replicas=4 .
  1. Check ACS AutoScaler Logs #Again
kubectl logs $(kubectl --namespace=default get pods -l "app=acs-engine-autoscaler" -o jsonpath="{.items[0].metadata.name}")
2018-04-17 17:57:27,990 - autoscaler.cluster - DEBUG - Using kube service account
2018-04-17 17:57:27,991 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-04-17 17:57:28,065 - autoscaler.cluster - INFO - Pods to schedule: 1
2018-04-17 17:57:28,065 - autoscaler.cluster - INFO - ++++ Scaling Up Begins ++++++
2018-04-17 17:57:28,065 - autoscaler.cluster - INFO - Nodes: 1
2018-04-17 17:57:28,065 - autoscaler.cluster - INFO - To schedule: 1
2018-04-17 17:57:28,065 - autoscaler.cluster - INFO - Pending pods: 1
2018-04-17 17:57:28,066 - autoscaler.cluster - DEBUG - hadoop-hadoop-yarn-nm-3
2018-04-17 17:57:28,066 - autoscaler.scaler - INFO - ====Scaling for 1 pods ====
2018-04-17 17:57:28,066 - autoscaler.scaler - DEBUG - units_needed: 2
2018-04-17 17:57:28,066 - autoscaler.scaler - DEBUG - units_requested: 2
2018-04-17 17:57:28,067 - autoscaler.scaler - DEBUG - linuxpool1 actual capacity: 1 , units requested: 2
2018-04-17 17:57:28,067 - autoscaler.scaler - INFO - New capacity requested for pool linuxpool1: 3 agents (current capacity: 1 agents)
2018-04-17 17:57:28,068 - autoscaler.scaler - DEBUG - remaining pending: 0
2018-04-17 17:57:28,080 - autoscaler.engine_scaler - INFO - Deployment autoscaler-deployment-8d63047d started...
2018-04-17 18:02:02,703 - autoscaler.deployments - INFO - Deployment finished: {'id': '/subscriptions/<subscription>/resourceGroups/acs-default/providers/Microsoft.Resources/deployments/autoscaler-deployment-8d63047d', 'name': 'autoscaler-deployment-8d63047d', 'properties': <azure.mgmt.resource.resources.v2017_05_10.models.deployment_properties_extended.DeploymentPropertiesExtended object at 0x7fa5a3473160>}
2018-04-17 18:02:02,703 - autoscaler.cluster - INFO - ++++ Scaling Up Ends ++++++
2018-04-17 18:02:02,703 - autoscaler.cluster - INFO - ++++ Maintenance Begins ++++++
2018-04-17 18:02:02,703 - autoscaler.engine_scaler - INFO - ++++ Maintaining Nodes ++++++
2018-04-17 18:02:02,704 - autoscaler.engine_scaler - INFO - node: k8s-linuxpool1-42410247-0                                                   state: busy
2018-04-17 18:02:02,704 - autoscaler.cluster - INFO - ++++ Maintenance Ends ++++++

The question is why when doing kubectl get nodes I only still see:

❯ kubectl get nodes
NAME                        STATUS    ROLES     AGE       VERSION
k8s-linuxpool1-42410247-0   Ready     agent     18m       v1.9.6
k8s-master-42410247-0       Ready     master    18m       v1.9.6
screen shot 2018-04-17 at 2 08 40 pm
sylus commented 6 years ago

I logged onto one of the newly created worker nodes and did a journalctl -u kubelet:

Apr 17 23:49:58 k8s-linuxpool1-42410247-1 docker[11042]: Error: failed to run Kubelet: could not init cloud provider "azure": No credentials provided for AAD application
Apr 17 23:49:58 k8s-linuxpool1-42410247-1 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
sylus commented 6 years ago

Ah nevermind using the correct cert was the issue, works great! thanks so much!