timescale / tobs

tobs - The Observability Stack for Kubernetes. Easy install of a full observability stack into a k8s cluster with Helm charts.
Apache License 2.0
563 stars 60 forks source link

Error on installing Tobs on GKE 1.24 #581

Closed umgbhalla closed 1 year ago

umgbhalla commented 2 years ago

What did you do? helm install otel timescale/tobs -n tobs-otel --create-namespace --wait --timeout 25m

Did you expect to see some different?

Environment

Client Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3", GitCommit:"aef86a93758dc3cb2c658dd9657ab4ad4afc21cb", GitTreeState:"clean", BuildDate:"2022-07-13T14:30:46Z", GoVersion:"go1.18.3", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.4
Server Version: version.Info{Major:"1", Minor:"24", GitVersion:"v1.24.3-gke.2100", GitCommit:"25d7334511e90d0b636707059c955baebce769cd", GitTreeState:"clean", BuildDate:"2022-08-16T09:24:54Z", GoVersion:"go1.18.3b7", Compiler:"gc", Platform:"linux/amd64"}

Anything else we need to know?:

image image

onprem commented 2 years ago

Hey, with some digging around, it looks like a bug in kubernetes (kubernetes/kubernetes#67761). There is also a issue on the helm side for this (helm/helm#9710), and an open PR fixing this as well, but it is not merged (helm/helm#9713).

Can you pleas add your helm CLI version to the issue as well? That'll help me reproduce this faster.

umgbhalla commented 2 years ago

Hi @onprem thanks for reply helm version

version.BuildInfo{Version:"v3.9.1", GitCommit:"a7c043acb5ff905c261cfdc923a35776ba5e66e4", GitTreeState:"clean", GoVersion:"go1.17.5"}

same happend on helm version

version.BuildInfo{Version:"v3.9.4", GitCommit:"dbc6d8e20fe1d58d50e6ed30f09a04a77e4c68db", GitTreeState:"clean", GoVersion:"go1.17.13"}
onprem commented 2 years ago

Going through the upstream issues, it looks like using helm with resource quotas enabled and a big helm chart is hit or miss. The problem occurs when helm tries to create a lot of resources (in tobs' case, it can create a lot of stuff as it bundles OTel and Kube Prometheus along with other projects) in a short amount of time. Every pod or service creation triggers an update in the remaining quota part of the ResourceQuota object and can lead to conflicts.

Currently helm does not have the retry patch merged and looks like the PR is abandoned as well due to lack of reviews over a long time.

The workaround I'd suggest is to incrementally roll out tobs.

Start with most of the components disabled. For example everything disabled apart from TimescaleDB, Promscale, and kube-prometheus (you can even disable some kube-prometheus parts as well, for example Grafana). Then update your helm release with more components enabled (like open telemetry). Even with this you might encounter the same error, but retrying the operation until it succeeds is the only workaround for now.

umgbhalla commented 2 years ago

ohk got it , using older version for helm would work ?

onprem commented 2 years ago

I don't think using an older version of helm would work. But if you are willing to, removing the resource quotas will do the trick here.

umgbhalla commented 2 years ago

yeah i tried removing the resource quotas but they get added back to the namespace as soon as removed kubectl delete resourcequota gke-resource-quotas -n tobs-otel

onprem commented 2 years ago

Ah, looks like they are immutable and cannot be removed: https://cloud.google.com/kubernetes-engine/quotas#resource_quotas.

github-actions[bot] commented 2 years ago

This issue went stale because it was not updated in a month. Please consider updating it to improve the quality of the project.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 30 days with no activity.