krkn-chaos / krkn

Chaos and resiliency testing tool for Kubernetes with a focus on improving performance under failure conditions. A CNCF sandbox project.
Apache License 2.0
276 stars 99 forks source link

Kraken Bug ROKS nodes #147

Open seanogor opened 3 years ago

seanogor commented 3 years ago

Kraken find node is not authorized after logging into the cluster on ROKS:

2021-09-03 10:22:43,943 [INFO] Initializing client to talk to the Kubernetes cluster
 _              _              
| | ___ __ __ _| | _____ _ __  
| |/ / '__/ _` | |/ / _ \ '_ \ 
|   <| | | (_| |   <  __/ | | |
|_|\_\_|  \__,_|_|\_\___|_| |_|

/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py:1020: InsecureRequestWarning: Unverified HTTPS request is being made to host 'c115-e.us-south.containers.cloud.ibm.com'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/1.26.x/advanced-usage.html#ssl-warnings
  InsecureRequestWarning,
Traceback (most recent call last):
  File "run_kraken.py", line 231, in <module>
    main(options.cfg)
  File "run_kraken.py", line 69, in main
    kubecli.find_kraken_node()
  File "/root/kraken/kraken/kubernetes/client.py", line 236, in find_kraken_node
    pods = get_all_pods()
  File "/root/kraken/kraken/kubernetes/client.py", line 118, in get_all_pods
    ret = cli.list_pod_for_all_namespaces(pretty=True)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api/core_v1_api.py", line 16864, in list_pod_for_all_namespaces
    return self.list_pod_for_all_namespaces_with_http_info(**kwargs)  # noqa: E501
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api/core_v1_api.py", line 16981, in list_pod_for_all_namespaces_with_http_info
    collection_formats=collection_formats)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 353, in call_api
    _preload_content, _request_timeout, _host)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 377, in request
    headers=headers)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 243, in GET
    query_params=query_params)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 233, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (401)
Reason: Unauthorized
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Fri, 03 Sep 2021 10:22:44 GMT', 'Content-Length': '165'})
HTTP response body: {
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {

  },
  "status": "Failure",
  "message": "Unauthorized",
  "reason": "Unauthorized",
  "code": 401
}
seanogor commented 3 years ago

Confirmed that this issue occurs when substituting podman in as the container cli, so it is most likely permission based

chaitanyaenr commented 3 years ago

Hi @seanogor, thanks for reporting the issue. May I know what node scenario is being run and the kubernetes version? Asking to make sure it doesn't require talking to the ROKS cloud as it's not supported as of now, it's needed for things like node shutdown/terminate etc. but not for kubelet/node crash since they just involve talking to the kube API irrespective of the cloud.

From the logs, looks like it's an issue with Kubernetes auth like you mentioned, can we try the following to see if it fixes it?

$ kubectl create clusterrolebinding serviceaccounts-cluster-admin \ --clusterrole=cluster-admin \ --group=system:serviceaccounts