Open Globegitter opened 5 years ago
We suspect that this was being caused by too many tracked releases and configmaps (over three thousand). We set the TILLER_HISTORY_MAX
environment variable, but since tiller couldn't find the last previous successful release, this alone didn't fix it.
The eventual resolution was to delete all the previous configmaps for each release, and then rerun tiller using UpgradeForce (as per https://github.com/bitnami-labs/helm-crd/pull/34), which caused tiller to re-own or recreate resources. The downside is that external components, like our AWS ELB and EBS volumes, were destroyed and recreated, so we lost some historical metric data from Prometheus and had to change some DNS records for the new load balancer.
Recently we started seeing the following error messages in the tiller controller, e.g.:
We seem to be getting this for almost all of our things deployed using helm-crd. There is one service that gets the following error:
But strangely enough that service exists.
We have no idea at this stage why this is happening, it just seemed to happen from one day to another without any major changes we are aware off, we have been on version
0.4.1
of the controller as well as2.9.1
of tiller for a few weeks now and it has been working smoothly.In till there are logs like:
and looking at the tiller configmaps we have configmaps for these services and looking e.g. at the nginx one, the latest version shows:
and the version before that shows
STATUS: SUPERSEDED
We are still investigating, but is there any way to get more insight on what is going on and what we are getting back from tiller?