kubernetes-sigs / kustomize

Customization of kubernetes YAML configurations
Apache License 2.0
11.09k stars 2.26k forks source link

Concurrency issues with Helm generator - archive already exists #5271

Open gaeljw opened 1 year ago

gaeljw commented 1 year ago

What happened?

Given the following file structure where the base has a helmChart reference:

base/kustomization.yaml <-- has a helmChart reference
overlays/overlay1/kustomization.yaml
overlays/overlay2/kustomization.yaml

If you run Kustomize on both overlays at the same time, there's a chance one of the 2 commands fail with following error:

Error: failed to untar:
a file or directory with the name /tmp/git@gitlab.com_xxxxxx_devops_apps/base/apps/data-query-gateway/charts/k8s-app-1.2.7.tgz already exists:
unable to run: 'helm pull --untar --untardir /tmp/git@gitlab.com_xxxxxx_devops_apps/base/apps/data-query-gateway/charts --repo https://argo:D64mNn3f6ZU77UGm2YWy@gitlab.com/api/v4/projects/29393074/packages/helm/stable k8s-app --version 1.2.7'
with env=[HELM_CONFIG_HOME=/tmp/kustomize-helm-837700893/helm HELM_CACHE_HOME=/tmp/kustomize-helm-837700893/helm/.cache HELM_DATA_HOME=/tmp/kustomize-helm-837700893/helm/.data]
(is 'helm' installed?)

See several users reporting the issue in the context of ArgoCD where it's a common setup to have multiple "applications" concurrently synchronizing and if they happen to share same Kustomize base, the error above happens.

What did you expect to happen?

Ideally, it should be possible to run Kustomize on both overlays without any concurrency issues.

How can we reproduce it (as minimally and precisely as possible)?

N/A

Expected output

N/A

Actual output

N/A

Kustomize version

5.0.3

Operating system

Linux

gaeljw commented 1 year ago

I haven't looked at the code yet. Is there any reason to raise this error in the 1st place?

I see several options:

Maybe the new behaviour could be enabled with a flag like --ignore-helm-concurrency-issue (just to give the idea, the name should be shorter of course!) if we're not confident it won't break existing usages.

gaeljw commented 1 year ago

Actually, looking now at the code, Kustomize is "only" calling Helm with a command similar to helm pull --untar --untardir ... --repo ... ... and the error comes from Helm.

https://github.com/kubernetes-sigs/kustomize/blob/911ddcda40a8c3591a1eee3929f775c8c8c57330/plugin/builtin/helmchartinflationgenerator/HelmChartInflationGenerator.go#L285-L296

It would be nice if Helm supported some kind of "overwrite" mode then I guess. But this might still bring concurrency issues? Maybe it's still best to handle it at Kustomize level?

Related issues on Helm:

be-hase commented 1 year ago

I am also encountering this problem.

DiptoChakrabarty commented 1 year ago

hi @gaeljw if you don't mind ,can you please share the kustomization file along with files you are using under the overlays directory so that we can also debug it once

greenkiwi commented 1 year ago

@DiptoChakrabarty I've created an example here:

https://github.com/greenkiwi/kustomize-helm-concurrency-issue

It can be run locally or it can be run via the Github Action there. It has our script that runs kustomize in parallel for the builds. Note, I've added a for loop, just to help ensure that the problem can be seen.

It also appears that if the chart is already there with the correct version, it doesn't have an issue.

gaeljw commented 1 year ago

@DiptoChakrabarty I can provide another minimal reproduction case as well if needed but it's quite similar to the one given by @greenkiwi (thanks!)

shakefu commented 1 year ago

Could a potential fix be to use os.MkdirTemp as the untar target and then copy the resulting filesystem into the target path, so helm won't fight itself?

natasha41575 commented 1 year ago

/assign @annasong20

for further investigation

gaeljw commented 1 year ago

Is there any progress on this issue? Do you think there's a chance it can be fixed?

annasong20 commented 12 months ago

/unassign Sorry @natasha41575, @gaeljw, I don't think I'll have time to investigate this issue.

annasong20 commented 11 months ago

Hi @gaeljw, sorry for the late response. I can reproduce your issue. Yes, as your comments point out, the problem is due to parallel calls to https://github.com/kubernetes-sigs/kustomize/blob/master/plugin/builtin/helmchartinflationgenerator/HelmChartInflationGenerator.go#L256-L261. Multiple threads see that the .tgz file doesn't exist and attempt to download it. After the 1st thread, successive downloads fail because the .tgz was downloaded by the 1st.

Our stance is that this is a helm issue, because Helm clearly doesn't support parallel runs of helm pull, as aforementioned. Pasting the terminal output here for clarity

helm pull --untar --untardir /tmp/charts/minecraft-3.1.3 --repo https://itzg.github.io/minecraft-server-charts minecraft --version 3.1.3

Error: failed to untar: a file or directory with the name /tmp/charts/minecraft-3.1.3/minecraft-3.1.3.tgz already exists

See Kustomize's stance on Helm support here.

However, if this issue is popular enough, I personally would be willing to accept a patch that either uses "locks" or unique download directory names (which will hurt performance). Would need to discuss with other Kustomize maintainers, especially @koba1t.

/area helm /triage under-consideration

greenkiwi commented 11 months ago

@annasong20 Should we also be raising this issue with helm?

gaeljw commented 11 months ago

Thanks for the feedback @annasong20

For now Helm considers this is not an issue with Helm but its usage in Kustomize. See the related issue: https://github.com/helm/helm/issues/12315

If anyone have strong arguments to present to Helm, please do :)

I'm more in the idea of fixing it in Kustomize but I don't have enough knowledge in Go to proceed with a PR.

annasong20 commented 11 months ago

@gaeljw Thanks for the context! I can empathize with Helm's argument.

Will discuss this issue with the team after the holidays, then we can update the triage status. /cc @greenkiwi @koba1t @natasha41575

natasha41575 commented 11 months ago

We have discussed this among the maintainers and we agree that this is a valid issue and we should try to fix it if we can. We are not sure that a locking mechanism will be possible because it is two different processes running, but we might be able to do something with the directory structure (e.g. adding a hash). If we modify the directory structure, we need to be really careful that we don't break anything with users' existing helm charts, whether those are local charts or remote charts that have already been pulled and cached by kustomize.

We are also open to other solutions if anyone has a feasible alternative.

/triage accepted

koba1t commented 11 months ago

I feel this opinion helps to resolve this problem. https://github.com/helm/helm/issues/12315#issuecomment-1688478222

Ignore this error from Helm.

This idea helps resolve this problem without hurting performance and breaking change. I believe this is a better solution, but it takes time to implement.

koba1t commented 10 months ago

I am waiting for this PR to be merged. https://github.com/helm/helm/pull/12725

imperfect-fourth commented 7 months ago

any updates on this issue? there's no activity on the above mentioned PR in the last 3 months.

Skaronator commented 5 months ago

I don't think https://github.com/helm/helm/pull/12725 get merged any time soon. It would be faster if we fix this in kustomize.

gaeljw commented 5 months ago

For the record, at my company, we changed our structure to workaround this issue.

Instead of having N Kustomize folders -> 1 Kustomize base -> 1 Helm chart, we integrated the generation of the N variants in the Helm chart itself using arrays/dict in the Helm values.

In our case, it makes sense and it's easier to maintain in the end, but it might not be for everyone for sure.

shoppingjaws commented 2 months ago

Adding retry-mechanism to solve this in kustomize not helm, right? Is there any idea?