timescale / tobs

tobs - The Observability Stack for Kubernetes. Easy install of a full observability stack into a k8s cluster with Helm charts.
Apache License 2.0
560 stars 61 forks source link

service accounts rights are missing for tobs-timescaledb role #646

Closed jgerry2002 closed 1 year ago

jgerry2002 commented 1 year ago

What happened? During update to 17.17.0 from 17.15.0 there was a RBAC related error with the service account rights for the tobs-timescaledb role.

Did you expect to see something different? The rights for the service account should be checked/confirmed. The error causes timescaledb to crashloop.

How to reproduce it (as minimally and precisely as possible): Install 17.50.0 via helm, proceed with upgrade to 17.17.0

Environment Kubernetes 1.21.14

Notes I was able to proceed with an update when I manually modified and added the ability for the service account to alter services.

jgerry2002 commented 1 year ago

This error also happens during a new install of 17.17.0

2022-11-15 10:17:31,167 ERROR: create_config_service failed
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 950, in _create_config_service
    if not self._api.create_namespaced_service(self._namespace, body):
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 483, in wrapper
    return getattr(self._core_v1_api, func)(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 419, in wrapper
    return self._api_client.call_api(method, path, headers, body, **kwargs)
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 388, in call_api
    return self._handle_server_response(response, _preload_content)
  File "/usr/lib/python3/dist-packages/patroni/dcs/kubernetes.py", line 218, in _handle_server_response
    raise k8s_client.rest.ApiException(http_resp=response)
patroni.dcs.kubernetes.K8sClient.rest.ApiException: (403)
Reason: Forbidden
HTTP response headers: HTTPHeaderDict({'Audit-Id': '6d0823fa-d627-4ab7-9342-894587a0b624', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'X-Content-Type-Options': 'nosniff', 'X-Kubernetes-Pf-Flowschema-Uid': '43fde79c-ff65-437e-b03c-edbdbb29013b', 'X-Kubernetes-Pf-Prioritylevel-Uid': '73f0f59e-1d05-46ee-8872-1f406aa1b018', 'Date': 'Tue, 15 Nov 2022 10:17:31 GMT', 'Content-Length': '299'})
HTTP response body: b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"services is forbidden: User \\"system:serviceaccount:tobs:tobs-timescaledb\\" cannot create resource \\"services\\" in API group \\"\\" in the namespace \\"tobs\\"","reason":"Forbidden","details":{"kind":"services"},"code":403}\n'
alienninja commented 1 year ago

This issue is explained in some detail here with a workaround. I just re-deployed the latest Tobs in hopes that this error was fixed, but can confirm I'm still seeing the issue. https://github.com/timescale/helm-charts/issues/405

After applying the recommended fix, the error does go away, however I'm still not getting any data in Grafana...

nhudson commented 1 year ago

This is/was a bug in Patroni which was fixed with https://github.com/zalando/patroni/issues/1132 & https://github.com/zalando/patroni/pull/2390.

We have merged the change to add these fixes into the timescale-db container image https://github.com/timescale/timescaledb-docker-ha/pull/319

The image changes are currently in the process of being tested. I hope to have a release soon.

kodeine commented 1 year ago

facing the same issue on latest version.

nhudson commented 1 year ago

timescale/timescaledb-ha:pg14.6-ts2.8.1-p0 has been released with the Patroni fix.

alienninja commented 1 year ago

I performed a fresh tobs installation with a modified my_values.yaml file that specified the new timescaledb image (timescaledb-ha:pg14.6-ts2.8.1-p0 and I still see the 403 error in the tobs-timescaledb-0 pod logs. The only item I changed in the tobs values file was tag: pg14.6-ts2.8.1-p0. I also confirmed this is the image that was pulled for tobs-timescaledb-0.

nhudson commented 1 year ago

I performed a fresh tobs installation with a modified my_values.yaml file that specified the new timescaledb image (timescaledb-ha:pg14.6-ts2.8.1-p0 and I still see the 403 error in the tobs-timescaledb-0 pod logs. The only item I changed in the tobs values file was tag: pg14.6-ts2.8.1-p0. I also confirmed this is the image that was pulled for tobs-timescaledb-0.

Yup you are correct, sorry about that. Can you try again with this image and let me know. timescale/timescaledb:pg14.6-ts2.8.1-patroni-static-primary-p0

nhudson commented 1 year ago

I am currently no longer seeing the error logs when using that image.

alienninja commented 1 year ago

timescale/timescaledb:pg14.6-ts2.8.1-patroni-static-primary-p0

I also no longer see the error in the logs after re-deploying with this image. Thank you!

nhudson commented 1 year ago

Awesome. I am going to close this, please re-open if needed.