Closed TimBobkov closed 6 years ago
Hi,
"--acs-deployment","<deployment-name>"
Did you forget to replace this by your actual deployment name? Or did just wanted to not display the deployment name on github?
If the error happens consistently, you can start the autoscaler with the --debug
flag, it will explicitly crash on the error and give you more info about what is going wrong.
Hi, yes, I have change --acs-deployment in deployment config. So I have run autoscaler with --debug and recieve such error: requests.exceptions.HTTPError: 403 Client Error: Forbidden for url: https://10.240.0.4:443/api/v1/nodes It seems like it's a different error...
Hi wbuchwalter, TimBobkov, I'm having the same issue with Unexpected error: <class 'requests.exceptions.HTTPError'> I have installed the autoscaler using the helm chart
Running Kubernetes on acs-engine
$ acs-engine version
Version: v0.11.0
GitCommit: cda96312
GitTreeState: clean
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.2", GitCommit:"bdaeafa71f6c7c04636251031f93464384d54963", GitTreeState:"clean", BuildDate:"2017-10-24T19:48:57Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"8", GitVersion:"v1.8.4", GitCommit:"9befc2b8928a9426501d3bf62f72849d5cbcd5a3", GitTreeState:"clean", BuildDate:"2017-11-20T05:17:43Z", GoVersion:"go1.8.3", Compiler:"gc", Platform:"linux/amd64"}
$ helm version
Client: &version.Version{SemVer:"v2.6.2", GitCommit:"be3ae4ea91b2960be98c07e8f73754e67e87963c", GitTreeState:"clean"}
Server: &version.Version{SemVer:"v2.6.2", GitCommit:"be3ae4ea91b2960be98c07e8f73754e67e87963c", GitTreeState:"clean"}
My acs-engine is using the calico networking engine (although with default settings currently, no additional network policies)
Single Master, 3 agents
I have custom a vnet, with separate subnets for master & agents
I have put the autoscaler into an autoscaler namespace
My values.yaml file has the following settings (have replaced the actual values with XXX)
acsenginecluster:
resourcegroup: XXX
azurespappid: XXX
azurespsecret: XXX
azuresptenantid: XXX
kubeconfigprivatekey: XXX
clientprivatekey: XXX
caprivatekey: XXX
acsdeployment: XXX
Using helm, I'm not sure of the right way to add the --debug flag
Errors in the autoscaler pod log
2018-01-25 09:25:02,630 - autoscaler.cluster - DEBUG - Using kube service account
2018-01-25 09:25:02,631 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-01-25 09:25:02,679 - autoscaler.cluster - WARNING - Unexpected error: <class 'requests.exceptions.HTTPError'>
2018-01-25 09:25:02,679 - autoscaler - WARNING - backoff: 60
2018-01-25 09:27:02,757 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-01-25 09:27:02,772 - autoscaler.cluster - WARNING - Unexpected error: <class 'requests.exceptions.HTTPError'>
2018-01-25 09:27:02,772 - autoscaler - WARNING - backoff: 120
2018-01-25 09:31:02,810 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-01-25 09:31:02,825 - autoscaler.cluster - WARNING - Unexpected error: <class 'requests.exceptions.HTTPError'>
2018-01-25 09:31:02,825 - autoscaler - WARNING - backoff: 240
2018-01-25 09:39:02,877 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-01-25 09:39:02,900 - autoscaler.cluster - WARNING - Unexpected error: <class 'requests.exceptions.HTTPError'>
2018-01-25 09:39:02,900 - autoscaler - WARNING - backoff: 480
2018-01-25 09:55:02,998 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-01-25 09:55:03,013 - autoscaler.cluster - WARNING - Unexpected error: <class 'requests.exceptions.HTTPError'>
2018-01-25 09:55:03,013 - autoscaler - WARNING - backoff: 960
2018-01-25 10:27:03,112 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-01-25 10:27:03,128 - autoscaler.cluster - WARNING - Unexpected error: <class 'requests.exceptions.HTTPError'>
2018-01-25 10:27:03,128 - autoscaler - WARNING - backoff: 1920
2018-01-25 11:31:03,229 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-01-25 11:31:03,244 - autoscaler.cluster - WARNING - Unexpected error: <class 'requests.exceptions.HTTPError'>
2018-01-25 11:31:03,244 - autoscaler - WARNING - backoff: 3840
2018-01-25 13:39:03,292 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-01-25 13:39:03,308 - autoscaler.cluster - WARNING - Unexpected error: <class 'requests.exceptions.HTTPError'>
2018-01-25 13:39:03,308 - autoscaler - WARNING - backoff: 7680
@TimBobkov - not sure if any of that matches with your environment?
Dave
Don't think so... We have use Kubernetes 1.9.2 deploying with acs-engine from master branch (build by myself). I deployed Autoscaler by Deployments. So cases complitely different but results is the same :)
@TimBobkov Can you share the .json
file you used to generate the ARM templates with acs-engine?
I would like to try to reproduce this.
@wbuchwalter hopes this will help you :)
{
"apiVersion": "vlabs",
"properties": {
"orchestratorProfile": {
"orchestratorType": "Kubernetes",
"orchestratorVersion": "1.9.2"
},
"masterProfile": {
"count": 3,
"dnsPrefix": "blockchain-prod",
"vmSize": "Standard_D2_v2",
"storageProfile" : "ManagedDisks"
},
"agentPoolProfiles": [
{
"name": "agentpool1",
"count": 3,
"vmSize": "Standard_D2_v2",
"osDiskSizeGB": 100,
"availabilityProfile": "AvailabilitySet",
"storageProfile" : "ManagedDisks"
}
],
"linuxProfile": {
"adminUsername": "azureuser",
"ssh": {
"publicKeys": [
{
"keyData": "<secret>"
}
]
}
},
"servicePrincipalProfile": {
"clientId": "<secret>",
"secret": "<secret>"
}
}
}
I have try to use Autoscaler at two different clusters. Both of them are Kubernetes 1.9.x, but one of them was create with acs-engine 0.10.0, other - with acs-engine from master branch. Both has the same error: requests.exceptions.HTTPError: 403 Client Error: Forbidden for url
@TimBobkov acs-engine 0.10.0 doesn't support k8s version above 1.8.4
, not sure how you managed to bypass the validation.
I am not able to deploy a sane k8s cluster using the latest version of acs-engine either because of https://github.com/Azure/acs-engine/issues/2162 (I will try again when https://github.com/Azure/acs-engine/pull/2160 is merged).
Vary strange... But in fact with acs-engine from muster branch I have deploy 1.9.2 Kubernetes without any error messages...
So this error was caused by RBAC, which is enabled by default in acs-engine >= 0.12.0. The autoscaler wasn't authorized to query k8s api since it wasn't authenticated.
I have tested and pushed a fix on master so you can try it out as well.
The README was updated with instructions, but here is a summary of what you need to do:
Clone this repo, and fill ./helm-chart/values.yaml
.
You'll need to provide a subscriptionId
as well now.
For clusters generated with acs-engine >= 0.12.0 you will also need to provide etcdclientprivatekey
and etcdserverprivatekey
and set rbac.install
to true.
The chart will create a new service account for the autoscaler as well as all the necessary RBAC roles and bindings.
Let me know if this solves the issue on your side as well.
@wbuchwalter - thank you for working this out, I will test the new master and update here with the results.
@davesykeselateral I'm not sure your issue is the same one since you created your cluster with acs-engine 0.11.0
. So unless you manually enabled RBAC in your cluster it should be a different cause.
You can still try this out and let me know.
If you still have an error, try redeploying the autoscaler with --debug
flag, and then open another issue with the logs.
@wbuchwalter sorry for the delay in replying. Haven’t managed to get time to test yet, but yes, I did have RBAC enabled, so hopefully it is the same issue. Will update again when I’ve tested.
@wbuchwalter - have managed to test now, and this does resolve my issue now, thank you.
@wbuchwalter How I can provide etcdclientprivatekey
and etcdServerPrivateKey
? The full List of options on README do not specify the flag through which it can be provided.
I have created cluster using acs-engine v0.13.0 with rbac enabled. I have provided --acs-deployment
and $SUBSCRIPTION_ID
. The error I am getting is
2018-03-02 10:59:56,685 - autoscaler.cluster - DEBUG - Using kube service account
2018-03-02 10:59:56,686 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++
2018-03-02 10:59:56,686 - autoscaler.cluster - INFO - Debug mode is on
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/usr/local/lib/python3.6/site-packages/urllib3/util/connection.py", line 83, in create_connection
raise err
File "/usr/local/lib/python3.6/site-packages/urllib3/util/connection.py", line 73, in create_connection
sock.connect(sa)
ConnectionRefusedError: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 850, in _validate_conn
conn.connect()
File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 284, in connect
conn = self._new_conn()
File "/usr/local/lib/python3.6/site-packages/urllib3/connection.py", line 150, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7fbec9522e10>: Failed to establish a new connection: [Errno 111] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
timeout=timeout
File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='10.240.0.4', port=443): Max retries exceeded with url: /api/v1/nodes (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fbec9522e10>: Failed to establish a new connection: [Errno 111] Connection refused',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main.py", line 111, in <module>
main()
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.6/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "main.py", line 100, in main
scaled = cluster.loop(debug)
File "/app/autoscaler/cluster.py", line 122, in loop
return self.loop_logic()
File "/app/autoscaler/cluster.py", line 137, in loop_logic
if not pykube_nodes:
File "/usr/local/lib/python3.6/site-packages/pykube/query.py", line 122, in __len__
return len(self.query_cache["objects"])
File "/usr/local/lib/python3.6/site-packages/pykube/query.py", line 115, in query_cache
cache["response"] = self.execute().json()
File "/usr/local/lib/python3.6/site-packages/pykube/query.py", line 99, in execute
r = self.api.get(**kwargs)
File "/usr/local/lib/python3.6/site-packages/pykube/http.py", line 127, in get
return self.session.get(*args, **self.get_kwargs(**kwargs))
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 521, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 508, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='10.240.0.4', port=443): Max retries exceeded with url: /api/v1/nodes (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fbec9522e10>: Failed to establish a new connection: [Errno 111] Connection refused',))
@diwakar-s-maurya this two keys are not needed, so you shouldn't specify them.
Have you set rbac.install
to true
in the helm chart before deploying?
Hi @wbuchwalter , I'm facing the same issue even after making rbac.install as true, 2018-06-08 05:33:02,314 - autoscaler.cluster - ERROR - Unexpected error: <class 'requests.exceptions.HTTPError'>, 403 Client Error: Forbidden for url: https://10.240.255.5:443/api/v1/nodes 2018-06-08 05:33:02,314 - autoscaler - WARNING - backoff: 60 I'm using k8s version of 1.9.6 and acs-engine v0.14.0.
@VeereshPatil setting rbac.install
to true helped me.
Thank You @sashabaranov , It worked for me.
I'm deploying Autoscaler with such conf:
and get such output at logs:
2018-01-22 10:01:29,576 - autoscaler.cluster - DEBUG - Using kube service account 2018-01-22 10:01:29,577 - autoscaler.cluster - INFO - ++++ Running Scaling Loop ++++++ 2018-01-22 10:01:29,635 - autoscaler.cluster - WARNING - Unexpected error: <class 'requests.exceptions.HTTPError'> 2018-01-22 10:01:29,635 - autoscaler - WARNING - backoff: 60
This is normal behavior?