helm / helm

The Kubernetes Package Manager
https://helm.sh
Apache License 2.0
26.86k stars 7.09k forks source link

helm template generation not consistent when manually defined charts/ are used #11816

Open dioguerra opened 1 year ago

dioguerra commented 1 year ago

I'm trying to investigate an issue where using manual edited charts to fix a metachart does not work. From upstream, dependencies are managed as such: https://docs.helm.sh/docs/topics/charts/#managing-dependencies-manually-via-the-charts-directory

So:

From our deployment, we have a gitlab template that packages and publishes the charts following the commands:

helm repo add foo https://${REGISTRY}/chartrepo/foo
helm repo add stable https://${REGISTRY}/chartrepo/stable
helm repo add $(basename ${REGISTRY_PATH}) "https://${REGISTRY_PATH}" --force-update
helm repo update
helm dep update .
helm lint .
helm package .
helm push ...

The helm template uses: version.BuildInfo{Version:"v3.8.1", GitCommit:"5cb9af4b1b271d11d7a97a71df3ac337dd94ad37", GitTreeState:"clean", GoVersion:"go1.17.5"}

After this, the chart is pulled and used in our cluster to bring up required apps. This command follows:

curl -O \${repo}/-/archive/\${branch}/metachart-\${branch}.tar.gz
tar zxf metachart-\${branch}.tar.gz
helm dep update metachart-\${branch}

This helm version is using an older version: 3.2.0

I tried to replicate all this on my local computer running a more recent helm version:

helm version
version.BuildInfo{Version:"v3.10.3", GitCommit:"835b7334cfe2e5e27870ab3ed4135f136eecc704", GitTreeState:"clean", GoVersion:"go1.18.9"}

The test was to manually replace nginx templates (one of the subcharts) with nothing to check our local chart overwrites changes:

helm template metachart ../metachart -n kube-system | yq '.metadata.name' | grep nginx | wc -l

So, things to test:

  1. Full chart without managed subcharts - 18 (baseline)
  2. No ingress-nginx.tgz, helm pull magnum/ingress-nginx --version 4.0.6 --untar=true, no helm dep update - 18
  3. No ingress-nginx.tgz, helm pull magnum/ingress-nginx --version 4.0.6 --untar=true, rm -rf charts/ingress-nginx/templates/*, no helm dep update - 0
  4. No ingress-nginx.tgz, helm pull magnum/ingress-nginx --version 4.0.6 --untar=true, rm -rf charts/ingress-nginx/templates/*, with helm dep update - 0

In point 4 (so, both manually edited chart and helm dep update command), I notice that sometimes the template generates nginx manifests, sometimes don't, in a very random order...

$ helm dependency update
$ ls charts/ingress-nginx/templates/
$ cat charts/ingress-nginx-4.0.6.tgz | sha256sum
30cc58457945667224bee646e1c714ab4add9a4ac519783f873cf9804ff6f9bb  -
$ helm template metadata ../metadata -n kube-system | yq '.metadata.name' | grep nginx | wc -l
coalesce.go:175: warning: skipped value for metadata.openstack-cloud-controller-manager.nodeSelector: Not a table.
18
$ cat charts/ingress-nginx-4.0.6.tgz | sha256sum
30cc58457945667224bee646e1c714ab4add9a4ac519783f873cf9804ff6f9bb  -
$ helm template metadata ../metadata -n kube-system | yq '.metadata.name' | grep nginx | wc -l
coalesce.go:175: warning: skipped value for metadata.openstack-cloud-controller-manager.nodeSelector: Not a table.
18
$ cat charts/ingress-nginx-4.0.6.tgz | sha256sum
30cc58457945667224bee646e1c714ab4add9a4ac519783f873cf9804ff6f9bb  -
$ helm template metadata ../metadata -n kube-system | yq '.metadata.name' | grep nginx | wc -l
coalesce.go:175: warning: skipped value for metadata.openstack-cloud-controller-manager.nodeSelector: Not a table.
0
$ helm template metadata ../metadata -n kube-system | yq '.metadata.name' | grep nginx | wc -l
coalesce.go:175: warning: skipped value for metadata.openstack-cloud-controller-manager.nodeSelector: Not a table.
18
$ helm template metadata ../metadata -n kube-system | yq '.metadata.name' | grep nginx | wc -l
coalesce.go:175: warning: skipped value for metadata.openstack-cloud-controller-manager.nodeSelector: Not a table.
0
$ helm template metadata ../metadata -n kube-system | yq '.metadata.name' | grep nginx | wc -l
coalesce.go:175: warning: skipped value for metadata.openstack-cloud-controller-manager.nodeSelector: Not a table.
0
$ helm template metadata ../metadata -n kube-system | yq '.metadata.name' | grep nginx | wc -l
coalesce.go:175: warning: skipped value for metadata.openstack-cloud-controller-manager.nodeSelector: Not a table.
0
$ helm template metadata ../metadata -n kube-system | yq '.metadata.name' | grep nginx | wc -l
coalesce.go:175: warning: skipped value for metadata.openstack-cloud-controller-manager.nodeSelector: Not a table.
18
$ helm template metadata ../metadata -n kube-system | yq '.metadata.name' | grep nginx | wc -l
coalesce.go:175: warning: skipped value for metadata.openstack-cloud-controller-manager.nodeSelector: Not a table.
0
$ helm template metadata ../metadata -n kube-system | yq '.metadata.name' | grep nginx | wc -l
coalesce.go:175: warning: skipped value for metadata.openstack-cloud-controller-manager.nodeSelector: Not a table.
18
$ helm template metadata ../metadata -n kube-system | yq '.metadata.name' | grep nginx | wc -l
coalesce.go:175: warning: skipped value for metadata.openstack-cloud-controller-manager.nodeSelector: Not a table.
0

Clearly spooky action at a distance

I tried also to drop one of the ingress-nginx.tgz or ingress-nginx/ (manually edited dir chart) and in both cases the results where consistent. 0 and 18 counts respectively.

Can someone help?

Output of helm version: helm version version.BuildInfo{Version:"v3.10.3", GitCommit:"835b7334cfe2e5e27870ab3ed4135f136eecc704", GitTreeState:"clean", GoVersion:"go1.18.9"}

Also tried, while investigating if it was a backwards compatibility thing:

ls ~/.local/bin/ -lash | grep helm 
   0 lrwxrwxrwx 1 dtomasgu dtomasgu   11 Feb  9 17:56 helm -> helm-3.10.3
 44M -rwxr-xr-x 1 dtomasgu dtomasgu  44M Dez 14 16:34 helm-3.10.3
 39M -rwxr-xr-x 1 dtomasgu dtomasgu  39M Apr 22  2020 helm-3.2.0
 43M -rwxr-xr-x 1 dtomasgu dtomasgu  43M Mär  9  2022 helm-3.8.1
 13M -rw-rw-r-- 1 dtomasgu dtomasgu  13M Apr 22  2020 helm-v3.2.0-linux-amd64.tar.gz

Output of kubectl version: Not relevant

Cloud Provider/Platform (AKS, GKE, Minikube etc.):

EDIT: for some phrasing fixes

dioguerra commented 1 year ago

/this is not a question @joejulian

Unless i am missing something.

joejulian commented 1 year ago

It is until it can be triaged and confirmed whether this is a bug in helm or an unsupported use case or a feature request or a misunderstanding of how helm works, etc.

If you can provide a script for someone to follow to duplicate your issue, that would go a long way. I've had this tab open since I labeled it trying to figure out what the steps are and what to ask, but I keep getting interrupted with $dayjob.

joejulian commented 1 year ago

My first guess is that you've got a named template in both the parent chart and a subchart that have the same name. Sometimes you get one, sometimes you get the other.

dioguerra commented 1 year ago

How to reproduce:

First, download a chart with subcharts, lets use kube-prometheus-stack as an example:

mkdir /tmp/test && cd /tmp/test
helm pull stable/kube-prometheus-stack --version 39.5.0 --untar=true && cd kube-prometheus-stack

originaly, a chart comes with no charts/ or with tgz Charts as we update and package our charts. So if we do:

helm dep update
ls charts/
grafana  grafana-6.32.18.tgz  kube-state-metrics  kube-state-metrics-4.15.0.tgz  prometheus-node-exporter  prometheus-node-exporter-3.3.1.tgz

now, lets run the tests described above:

while true; do helm template ../kube-prometheus-stack/ -n kube-system | yq '.metadata.name' | grep grafana | wc -l; done
16

We can run multiple times, the number of generated manifests will be the same (because at this point, charts/grafana and charts/grafana.tgz are equal).

If we change grafana chart to be a managed chart , and try again: NOTE: took a while but still happens...

rm -rf charts/grafana/templates/*
while true; do helm template ../kube-prometheus-stack/ -n kube-system | yq '.metadata.name' | grep grafana | wc -l; done
2
16
2
2
2
16
2
16
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
16
2
2
16

And again, stabilizes (at 2 references, indefinetly):

rm -rf charts/*.tgz
while true; do helm template ../kube-prometheus-stack/ -n kube-system | yq '.metadata.name' | grep grafana | wc -l; done
github-actions[bot] commented 1 year ago

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

dioguerra commented 1 year ago

this is still a problem

ramalhais-sky commented 5 months ago

I think i know what's going on.

helm randomly uses either the files under charts or the .tgz files it generated previously from helm dependency update

To work-around this bug, always run: helm dependency update before running helm template or install or upgrade