wbuchwalter / Kubernetes-acs-engine-autoscaler

[Deprecated] Node-level autoscaler for Kubernetes clusters created with acs-engine.
Other
71 stars 22 forks source link

Unexpected error: class 'requests.exceptions.HTTPError' #75

Closed TimBobkov closed 6 years ago

TimBobkov commented 6 years ago

I'm deploying Autoscaler with such conf:

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: autoscaler
  namespace: kube-system
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: autoscaler
        openai/do-not-drain: "true"
    spec:
      containers:
        - name: autoscaler
          image: wbuchwalter/kubernetes-acs-engine-autoscaler:latest
          env:
            - name: AZURE_SP_APP_ID
              valueFrom:
                secretKeyRef:
                  name: autoscaler
                  key: azure-sp-app-id
            - name: AZURE_SP_SECRET
              valueFrom:
                secretKeyRef:
                  name: autoscaler
                  key: azure-sp-secret
            - name: AZURE_SP_TENANT_ID
              valueFrom:
                secretKeyRef:
                  name: autoscaler
                  key: azure-sp-tenant-id
            - name: KUBECONFIG_PRIVATE_KEY
              valueFrom:
                secretKeyRef:
                  name: autoscaler
                  key: kubeconfig-private-key
            - name: CLIENT_PRIVATE_KEY
              valueFrom:
                secretKeyRef:
                  name: autoscaler
                  key: client-private-key
            - name: CA_PRIVATE_KEY
              valueFrom:
                secretKeyRef:
                  name: autoscaler
                  key: ca-private-key
          command: ["python","main.py","--resource-group","blockchain-acs","-vvv","--spare-agents","3","--acs-deployment","<deployment-name>"]
          imagePullPolicy: Always
      restartPolicy: Always
      dnsPolicy: Default

and get such output at logs:

2018-01-22 10:01:29,576 - autoscaler.cluster - DEBUG - Using kube service account 2018-01-22 10:01:29,577 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++ 2018-01-22 10:01:29,635 - autoscaler.cluster - WARNING - Unexpected error: <class 'requests.exceptions.HTTPError'> 2018-01-22 10:01:29,635 - autoscaler - WARNING - backoff: 60

This is normal behavior?

wbuchwalter commented 6 years ago

Hi, "--acs-deployment","<deployment-name>" Did you forget to replace this by your actual deployment name? Or did just wanted to not display the deployment name on github?

If the error happens consistently, you can start the autoscaler with the --debug flag, it will explicitly crash on the error and give you more info about what is going wrong.

TimBobkov commented 6 years ago

Hi, yes, I have change --acs-deployment in deployment config. So I have run autoscaler with --debug and recieve such error: requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://10.240.0.4:443/api/v1/nodes It seems like it's a different error...

davesykeselateral commented 6 years ago

Hi wbuchwalter, TimBobkov, I'm having the same issue with Unexpected error: <class 'requests.exceptions.HTTPError'> I have installed the autoscaler using the helm chart

Running Kubernetes on acs-engine

$ acs-engine version
Version: v0.11.0
GitCommit: cda96312
GitTreeState: clean
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.2", GitCommit:"bdaeafa71f6c7c04636251031f93464384d54963", GitTreeState:"clean", BuildDate:"2017-10-24T19:48:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.4", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"clean", BuildDate:"2017-11-20T05:17:43Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
$ helm version
Client: &version.Version{SemVer:"v2.6.2", GitCommit:"be3ae4ea91b2960be98c07e8f73754e67e87963c", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.6.2", GitCommit:"be3ae4ea91b2960be98c07e8f73754e67e87963c", GitTreeState:"clean"}

My acs-engine is using the calico networking engine (although with default settings currently, no additional network policies)

Single Master, 3 agents

I have custom a vnet, with separate subnets for master & agents

I have put the autoscaler into an autoscaler namespace

My values.yaml file has the following settings (have replaced the actual values with XXX)

acsenginecluster:
  resourcegroup: XXX
  azurespappid: XXX
  azurespsecret: XXX
  azuresptenantid: XXX
  kubeconfigprivatekey: XXX
  clientprivatekey: XXX
  caprivatekey: XXX
  acsdeployment: XXX

Using helm, I'm not sure of the right way to add the --debug flag

Errors in the autoscaler pod log

2018-01-25 09:25:02,630 - autoscaler.cluster - DEBUG - Using kube service account
2018-01-25 09:25:02,631 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-01-25 09:25:02,679 - autoscaler.cluster - WARNING - Unexpected error: &lt;class 'requests.exceptions.HTTPError'&gt;
2018-01-25 09:25:02,679 - autoscaler - WARNING - backoff: 60
2018-01-25 09:27:02,757 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-01-25 09:27:02,772 - autoscaler.cluster - WARNING - Unexpected error: &lt;class 'requests.exceptions.HTTPError'&gt;
2018-01-25 09:27:02,772 - autoscaler - WARNING - backoff: 120
2018-01-25 09:31:02,810 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-01-25 09:31:02,825 - autoscaler.cluster - WARNING - Unexpected error: &lt;class 'requests.exceptions.HTTPError'&gt;
2018-01-25 09:31:02,825 - autoscaler - WARNING - backoff: 240
2018-01-25 09:39:02,877 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-01-25 09:39:02,900 - autoscaler.cluster - WARNING - Unexpected error: &lt;class 'requests.exceptions.HTTPError'&gt;
2018-01-25 09:39:02,900 - autoscaler - WARNING - backoff: 480
2018-01-25 09:55:02,998 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-01-25 09:55:03,013 - autoscaler.cluster - WARNING - Unexpected error: &lt;class 'requests.exceptions.HTTPError'&gt;
2018-01-25 09:55:03,013 - autoscaler - WARNING - backoff: 960
2018-01-25 10:27:03,112 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-01-25 10:27:03,128 - autoscaler.cluster - WARNING - Unexpected error: &lt;class 'requests.exceptions.HTTPError'&gt;
2018-01-25 10:27:03,128 - autoscaler - WARNING - backoff: 1920
2018-01-25 11:31:03,229 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-01-25 11:31:03,244 - autoscaler.cluster - WARNING - Unexpected error: &lt;class 'requests.exceptions.HTTPError'&gt;
2018-01-25 11:31:03,244 - autoscaler - WARNING - backoff: 3840
2018-01-25 13:39:03,292 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-01-25 13:39:03,308 - autoscaler.cluster - WARNING - Unexpected error: &lt;class 'requests.exceptions.HTTPError'&gt;
2018-01-25 13:39:03,308 - autoscaler - WARNING - backoff: 7680

@TimBobkov - not sure if any of that matches with your environment?

Dave

TimBobkov commented 6 years ago

Don't think so... We have use Kubernetes 1.9.2 deploying with acs-engine from master branch (build by myself). I deployed Autoscaler by Deployments. So cases complitely different but results is the same :)

wbuchwalter commented 6 years ago

@TimBobkov Can you share the .json file you used to generate the ARM templates with acs-engine? I would like to try to reproduce this.

TimBobkov commented 6 years ago

@wbuchwalter hopes this will help you :)

{
 "apiVersion": "vlabs",
  "properties": {
    "orchestratorProfile": {
      "orchestratorType": "Kubernetes",
      "orchestratorVersion": "1.9.2"
    },
    "masterProfile": {
      "count": 3,
      "dnsPrefix": "blockchain-prod",
      "vmSize": "Standard_D2_v2",
      "storageProfile" : "ManagedDisks"
    },
    "agentPoolProfiles": [
      {
        "name": "agentpool1",
        "count": 3,
        "vmSize": "Standard_D2_v2",
        "osDiskSizeGB": 100,
        "availabilityProfile": "AvailabilitySet",
        "storageProfile" : "ManagedDisks"
      }
    ],
    "linuxProfile": {
      "adminUsername": "azureuser",
      "ssh": {
        "publicKeys": [
          {
            "keyData": "<secret>"
          }
        ]
      }
    },
    "servicePrincipalProfile": {
      "clientId": "<secret>",
      "secret": "<secret>"
    }
  }
}
TimBobkov commented 6 years ago

I have try to use Autoscaler at two different clusters. Both of them are Kubernetes 1.9.x, but one of them was create with acs-engine 0.10.0, other - with acs-engine from master branch. Both has the same error: requests.exceptions.HTTPError: 403 Client Error: Forbidden for url

wbuchwalter commented 6 years ago

@TimBobkov acs-engine 0.10.0 doesn't support k8s version above 1.8.4, not sure how you managed to bypass the validation.
I am not able to deploy a sane k8s cluster using the latest version of acs-engine either because of https://github.com/Azure/acs-engine/issues/2162 (I will try again when https://github.com/Azure/acs-engine/pull/2160 is merged).

TimBobkov commented 6 years ago

Vary strange... But in fact with acs-engine from muster branch I have deploy 1.9.2 Kubernetes without any error messages...

wbuchwalter commented 6 years ago

So this error was caused by RBAC, which is enabled by default in acs-engine >= 0.12.0. The autoscaler wasn't authorized to query k8s api since it wasn't authenticated. I have tested and pushed a fix on master so you can try it out as well. The README was updated with instructions, but here is a summary of what you need to do: Clone this repo, and fill ./helm-chart/values.yaml. You'll need to provide a subscriptionId as well now. For clusters generated with acs-engine >= 0.12.0 you will also need to provide etcdclientprivatekey and etcdserverprivatekey and set rbac.install to true.

The chart will create a new service account for the autoscaler as well as all the necessary RBAC roles and bindings.

Let me know if this solves the issue on your side as well.

davesykeselateral commented 6 years ago

@wbuchwalter - thank you for working this out, I will test the new master and update here with the results.

wbuchwalter commented 6 years ago

@davesykeselateral I'm not sure your issue is the same one since you created your cluster with acs-engine 0.11.0. So unless you manually enabled RBAC in your cluster it should be a different cause.

You can still try this out and let me know. If you still have an error, try redeploying the autoscaler with --debug flag, and then open another issue with the logs.

davesykeselateral commented 6 years ago

@wbuchwalter sorry for the delay in replying. Haven’t managed to get time to test yet, but yes, I did have RBAC enabled, so hopefully it is the same issue. Will update again when I’ve tested.

davesykeselateral commented 6 years ago

@wbuchwalter - have managed to test now, and this does resolve my issue now, thank you.

diwakar-s-maurya commented 6 years ago

@wbuchwalter How I can provide etcdclientprivatekey and etcdServerPrivateKey? The full List of options on README do not specify the flag through which it can be provided.

I have created cluster using acs-engine v0.13.0 with rbac enabled. I have provided --acs-deployment and $SUBSCRIPTION_ID. The error I am getting is

2018-03-02 10:59:56,685 - autoscaler.cluster - DEBUG - Using kube service account
2018-03-02 10:59:56,686 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-03-02 10:59:56,686 - autoscaler.cluster - INFO - Debug mode is on
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 141, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/usr/local/lib/python3.6/site-packages/urllib3/util/connection.py", line 83, in create_connection
    raise err
  File "/usr/local/lib/python3.6/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
 During handling of the above exception, another exception occurred:
 Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 346, in _make_request
    self._validate_conn(conn)
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 850, in _validate_conn
    conn.connect()
  File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 284, in connect
    conn = self._new_conn()
  File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 150, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: &lt;urllib3.connection.VerifiedHTTPSConnection object at 0x7fbec9522e10&gt;: Failed to establish a new connection: [Errno 111] Connection refused
 During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.240.0.4', port=443): Max retries exceeded with url: /api/v1/nodes (Caused by NewConnectionError('&lt;urllib3.connection.VerifiedHTTPSConnection object at 0x7fbec9522e10&gt;: Failed to establish a new connection: [Errno 111] Connection refused',))
 During handling of the above exception, another exception occurred:
 Traceback (most recent call last):
  File "main.py", line 111, in &lt;module&gt;
    main()
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "main.py", line 100, in main
    scaled = cluster.loop(debug)
  File "/app/autoscaler/cluster.py", line 122, in loop
    return self.loop_logic()
  File "/app/autoscaler/cluster.py", line 137, in loop_logic
    if not pykube_nodes:
  File "/usr/local/lib/python3.6/site-packages/pykube/query.py", line 122, in __len__
    return len(self.query_cache["objects"])
  File "/usr/local/lib/python3.6/site-packages/pykube/query.py", line 115, in query_cache
    cache["response"] = self.execute().json()
  File "/usr/local/lib/python3.6/site-packages/pykube/query.py", line 99, in execute
    r = self.api.get(**kwargs)
  File "/usr/local/lib/python3.6/site-packages/pykube/http.py", line 127, in get
    return self.session.get(*args, **self.get_kwargs(**kwargs))
  File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 521, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 508, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='10.240.0.4', port=443): Max retries exceeded with url: /api/v1/nodes (Caused by NewConnectionError('&lt;urllib3.connection.VerifiedHTTPSConnection object at 0x7fbec9522e10&gt;: Failed to establish a new connection: [Errno 111] Connection refused',))
wbuchwalter commented 6 years ago

@diwakar-s-maurya this two keys are not needed, so you shouldn't specify them. Have you set rbac.install to true in the helm chart before deploying?

VeereshPatil commented 6 years ago

Hi @wbuchwalter , I'm facing the same issue even after making rbac.install as true, 2018-06-08 05:33:02,314 - autoscaler.cluster - ERROR - Unexpected error: <class 'requests.exceptions.HTTPError'>, 403 Client Error: Forbidden for url: https://10.240.255.5:443/api/v1/nodes 2018-06-08 05:33:02,314 - autoscaler - WARNING - backoff: 60 I'm using k8s version of 1.9.6 and acs-engine v0.14.0.

sashabaranov commented 6 years ago

@VeereshPatil setting rbac.install to true helped me.

VeereshPatil commented 6 years ago

Thank You @sashabaranov , It worked for me.