krkn-chaos / cerberus

Guardian of Kubernetes clusters. Tool to monitor clusters health and signal/alert on failures.
Apache License 2.0
92 stars 42 forks source link

get_clusterversion_string function throws error for kubernetes distribution #181

Open Rajalakshmi-Girish opened 2 years ago

Rajalakshmi-Girish commented 2 years ago

Cerberus throws the below error for Kubernetes distribution:

2022-09-08 05:14:57,677 [INFO] Fetching cluster info
               _
  ___ ___ _ __| |__   ___ _ __ _   _ ___
 / __/ _ \ '__| '_ \ / _ \ '__| | | / __|
| (_|  __/ |  | |_) |  __/ |  | |_| \__ \
 \___\___|_|  |_.__/ \___|_|   \__,_|___/

Traceback (most recent call last):
  File "start_cerberus.py", line 557, in <module>
    main(options.cfg)
  File "start_cerberus.py", line 126, in main
    cv = kubecli.get_clusterversion_string()
  File "/root/cerberus/cerberus/kubernetes/client.py", line 432, in get_clusterversion_string
    "clusterversions",
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api/custom_objects_api.py", line 1942, in list_cluster_custom_object
    return self.list_cluster_custom_object_with_http_info(group, version, plural, **kwargs)  # noqa: E501
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api/custom_objects_api.py", line 2087, in list_cluster_custom_object_with_http_info
    collection_formats=collection_formats)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 353, in call_api
    _preload_content, _request_timeout, _host)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 184, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/api_client.py", line 377, in request
    headers=headers)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 244, in GET
    query_params=query_params)
  File "/usr/local/lib/python3.6/site-packages/kubernetes/client/rest.py", line 234, in request
    raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '6ae456ee-000e-4c1e-a1ba-a4069c0eecfc', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': 'a09e2577-2522-4246-a181-30b35af87575', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'ec60f9f8-d58e-4db6-8a63-ae31c7d57d3c', 'Date': 'Thu, 08 Sep 2022 05:14:57 GMT', 'Content-Length': '374'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"clusterversions.config.openshift.io is forbidden: User \"system:serviceaccount:prow:cerberus\" cannot list resource \"clusterversions\" in API group \"config.openshift.io\" at the cluster scope","reason":"Forbidden","details":{"group":"config.openshift.io","kind":"clusterversions"},"code":403}

This is observed after the change https://github.com/redhat-chaos/cerberus/pull/169

Asking for list permission for resource clusterveersions in API group config.openshift.io doesnt seem right when the distribution is Kubernetes.

https://github.com/redhat-chaos/cerberus/blob/0ce6f371d8577ea50d4ecd080f5b998884cf91d9/cerberus/kubernetes/client.py#L426 says the function should return empty string for the distributions other than openshift.

ghost commented 2 years ago

Hello @Rajalakshmi-Girish this seems to be a permission issue:

User \"system:serviceaccount:prow:cerberus\" cannot list resource \"clusterversions\" in API group \"config.openshift.io\" at the cluster scope

This means that your user has insufficient permissions to check if the clusterversions resource exists, which is why this check fails.

Rajalakshmi-Girish commented 2 years ago

user has insufficient permissions to check if the clusterversions resource exists,

Yes, I understand. But why would a user in Kubernetes distribution need permissions to a resource in the config.openshift.io API group? I am trying to run Cerberus against a Kubernetes cluster.

ghost commented 2 years ago

@redhat-chaos/developers I have no strong opinion on this, the 403 code can be added to the error handling. This would mean that if the user does not have permissions to the clusterversion resource, we assume it's not an OpenShift.