Open gaeljw opened 1 year ago
I haven't looked at the code yet. Is there any reason to raise this error in the 1st place?
I see several options:
Maybe the new behaviour could be enabled with a flag like --ignore-helm-concurrency-issue
(just to give the idea, the name should be shorter of course!) if we're not confident it won't break existing usages.
Actually, looking now at the code, Kustomize is "only" calling Helm with a command similar to helm pull --untar --untardir ... --repo ... ...
and the error comes from Helm.
It would be nice if Helm supported some kind of "overwrite" mode then I guess. But this might still bring concurrency issues? Maybe it's still best to handle it at Kustomize level?
Related issues on Helm:
I am also encountering this problem.
hi @gaeljw if you don't mind ,can you please share the kustomization file along with files you are using under the overlays directory so that we can also debug it once
@DiptoChakrabarty I've created an example here:
https://github.com/greenkiwi/kustomize-helm-concurrency-issue
It can be run locally or it can be run via the Github Action there. It has our script that runs kustomize in parallel for the builds. Note, I've added a for loop, just to help ensure that the problem can be seen.
It also appears that if the chart is already there with the correct version, it doesn't have an issue.
@DiptoChakrabarty I can provide another minimal reproduction case as well if needed but it's quite similar to the one given by @greenkiwi (thanks!)
Could a potential fix be to use os.MkdirTemp
as the untar target and then copy the resulting filesystem into the target path, so helm won't fight itself?
/assign @annasong20
for further investigation
Is there any progress on this issue? Do you think there's a chance it can be fixed?
/unassign Sorry @natasha41575, @gaeljw, I don't think I'll have time to investigate this issue.
Hi @gaeljw, sorry for the late response. I can reproduce your issue. Yes, as your comments point out, the problem is due to parallel calls to https://github.com/kubernetes-sigs/kustomize/blob/master/plugin/builtin/helmchartinflationgenerator/HelmChartInflationGenerator.go#L256-L261. Multiple threads see that the .tgz file doesn't exist and attempt to download it. After the 1st thread, successive downloads fail because the .tgz
was downloaded by the 1st.
Our stance is that this is a helm
issue, because Helm clearly doesn't support parallel runs of helm pull
, as aforementioned. Pasting the terminal output here for clarity
helm pull --untar --untardir /tmp/charts/minecraft-3.1.3 --repo https://itzg.github.io/minecraft-server-charts minecraft --version 3.1.3
Error: failed to untar: a file or directory with the name /tmp/charts/minecraft-3.1.3/minecraft-3.1.3.tgz already exists
See Kustomize's stance on Helm support here.
However, if this issue is popular enough, I personally would be willing to accept a patch that either uses "locks" or unique download directory names (which will hurt performance). Would need to discuss with other Kustomize maintainers, especially @koba1t.
/area helm /triage under-consideration
@annasong20 Should we also be raising this issue with helm
?
Thanks for the feedback @annasong20
For now Helm considers this is not an issue with Helm but its usage in Kustomize. See the related issue: https://github.com/helm/helm/issues/12315
If anyone have strong arguments to present to Helm, please do :)
I'm more in the idea of fixing it in Kustomize but I don't have enough knowledge in Go to proceed with a PR.
@gaeljw Thanks for the context! I can empathize with Helm's argument.
Will discuss this issue with the team after the holidays, then we can update the triage status. /cc @greenkiwi @koba1t @natasha41575
We have discussed this among the maintainers and we agree that this is a valid issue and we should try to fix it if we can. We are not sure that a locking mechanism will be possible because it is two different processes running, but we might be able to do something with the directory structure (e.g. adding a hash). If we modify the directory structure, we need to be really careful that we don't break anything with users' existing helm charts, whether those are local charts or remote charts that have already been pulled and cached by kustomize.
We are also open to other solutions if anyone has a feasible alternative.
/triage accepted
I feel this opinion helps to resolve this problem. https://github.com/helm/helm/issues/12315#issuecomment-1688478222
Ignore this error from Helm.
This idea helps resolve this problem without hurting performance and breaking change. I believe this is a better solution, but it takes time to implement.
I am waiting for this PR to be merged. https://github.com/helm/helm/pull/12725
any updates on this issue? there's no activity on the above mentioned PR in the last 3 months.
I don't think https://github.com/helm/helm/pull/12725 get merged any time soon. It would be faster if we fix this in kustomize.
For the record, at my company, we changed our structure to workaround this issue.
Instead of having N Kustomize folders -> 1 Kustomize base -> 1 Helm chart, we integrated the generation of the N variants in the Helm chart itself using arrays/dict in the Helm values.
In our case, it makes sense and it's easier to maintain in the end, but it might not be for everyone for sure.
Adding retry-mechanism to solve this in kustomize not helm, right? Is there any idea?
What happened?
Given the following file structure where the base has a
helmChart
reference:If you run Kustomize on both overlays at the same time, there's a chance one of the 2 commands fail with following error:
See several users reporting the issue in the context of ArgoCD where it's a common setup to have multiple "applications" concurrently synchronizing and if they happen to share same Kustomize base, the error above happens.
What did you expect to happen?
Ideally, it should be possible to run Kustomize on both overlays without any concurrency issues.
How can we reproduce it (as minimally and precisely as possible)?
N/A
Expected output
N/A
Actual output
N/A
Kustomize version
5.0.3
Operating system
Linux