Open Nuru opened 5 years ago
Previously helmfile sync
and helmfile diff
had this behavior, but upgrading from 0.60.x to the latest of 0.80.x this behavior seems to have reverted.
@Nuru @sstarcher Thanks for reporting! Wouldn't #828 fully fix this, am I correct?
@mumoshu I do not fully understand what #828 does.
different aliases being treated as 2 separate repos
Ah good point! You are correct.
The fix is rather hard though. We must fix it so that (1)we use the url of each repo rather than name
to skip redundant updates:
https://github.com/roboll/helmfile/blob/master/pkg/app/context.go#L22-L30
And we must enhance helmfile to save/restore the repository index cache instead of delegating it to helm:
$ ls ~/.helm/repository/cache/
brigade-index.yaml flagger-index.yaml flux-index.yaml fluxcd-index.yaml incubator-index.yaml istio.io-index.yaml local-index.yaml sp-index.yaml stable-index.yaml
It is not clear to me that it would mean then when a helmfile adds a new repo and causes an update, the existing repos would not also get updated, even though they were updated by an earlier helmfile.
So helmfile as of today triggers at-most-one helm repo up
per each helmfile.yaml
. If helmfile
triggers helm repo up
for two separate helmfile.yaml
files, the latter repo up
doesn't skip already updated ones, as helm repo up
doesn't support limiting repos to be updated...
So if we're going to support it in helmfile, helmfile must be enhanced to filter repos to be updated. Maybe helmfile
should run helm repo
in where $HELM_HOME
is pointed to a temporary directory set up by helmfile
and copies it into the ~/.helm
before running any helm command other than repo up
?
@mumoshu I think we have the same general ideas about how to fix this, but I have some different specific ideas. I have not had the time to fully learn and understand how helmfile
works, but I understand it and helm
a little better now, and with that I would attempt to solve this problem this way:
helmfile
no matter how many helmfiles are in helmfile.d
As far as I can tell, helm repo add
performs an update on the repo if it already exists, so I would leverage that behavior this way:
In a first pass through the helmfile.d
for any operation that is going to require repos to be updated:
local
as http://127.0.0.1:8879/charts
and another defines local
as https://my-chart-repo.my-co.io
?)helm repo add
for each repository "name" defined. So only one add
per name, but at this level we will still add the same repo URL twice if it is defined twice with different names.helm
updates to repos for the rest of the runIf I'm wrong about add
also doing update
, then after adding all the repos, call helm repo up
once.
Probably better to do this as a PR for helm
, because otherwise you are creating a fragile dependency of helmfile
on internals of helm
, and the fix would benefit helm
users and be easier to implement in helm
anyway. I understand the internals of helm
even less than I do the internals of helmfile
, but I hope that one or more of the helmfile
contributors understands helm
well enough to make this PR:
helm
uses from <name>.yaml
to <URL>.yaml
or <URL encoding of URL>.yaml
depending on how much you care about the URL having special characters in it.helm repo up
to update caches by unique cache key rather than by name.I think this will work without much disruption to the rest of helm
. Helm would continue to have a repository.yaml
entry for each repo name
, so any action on a repo name would work as before. Since this would speed up helm repo up
, I think the helm
maintainers should accept it.
Even without this "Part 2" solution, the "Part 1" will dramatically speed up my use case of having 20+ helmfiles.
As far as I can tell, helm repo add performs an update on the repo if it already exists, so I would leverage that behavior this way:
Wow, neat trick! I have not realized that helm repo add
had updated the repo until now, but yeah that would definitely work.
De-duplicate the list of repo definitions so you have only 1 definition per repo name. (By the way, what happens now if 2 different helmfiles give different URLs the same name? For example, one defines local as http://127.0.0.1:8879/charts and another defines local as https://my-chart-repo.my-co.io?)
I think we need to solve the part 2 in some way in order to address this de-duplication.
The easiest way came to my mind is to enhance Helmfile to internally translate the repo aliases defined and referenced within helmfile.yaml
into globally unique repo IDs.
An example implementation of URLStringToRepoKey
to illustrate it follows:
package main
import (
"os"
"regexp"
)
var (
r = regexp.MustCompile(`[^[:alnum:]]+`)
)
func URLStringToRepoKey(s string) string {
return r.ReplaceAllString(s, "-")
}
func main() {
k := URLStringToRepoKey(os.Args[1])
println(k)
}
$ go run main.go https://fluxcd.github.io/flux
https-fluxcd-github-io-flux
helmfile.yaml
like the below:
repositories:
- name: flagger
url: https://flagger.app
releases:
- name: podinfo
chart: flagger/podinfo
Will be internally transformed to:
repositories:
- name: https-flagger-app
url: https://flagger.app
releases:
- name: podinfo
chart: https-flagger-app/podinfo
WDYT?
Your idea of internally translating helm repo names solves the problem of needing to make changes to the helm
code, and it might be good enough, but I have 2 concerns.
First is that the names have to really be globally unique and avoid name collisions. We have some complicated repo URLs in our charts, for example git+https://github.com/stakater/Forecastle@deployments/kubernetes/chart?ref=8f36b82beaf2a1a42b364a3857bc83638c51e30b
, and we want to make sure we handle these weird cases properly. That is why I suggested URL-encoding them.
package main
import (
"net/url"
"os"
)
func URLStringToRepoKey(s string) string {
return url.QueryEscape(s)
}
func main() {
k := URLStringToRepoKey(os.Args[1])
println(k)
}
One downside of either of these renaming conventions is that the output from helm
will contain the renamed repos, which could be confusing to users.
"https-flagger-app" has been added to your repositories
"git%2Bhttps%3A%2F%2Fgithub.com%2Fstakater%2FForecastle%40deployments%2Fkubernetes%2Fchart%3Fref%3D8f36b82beaf2a1a42b364a3857bc83638c51e30b" has been added to your repositories
Another downside is that users using both helmfile
and helm
will have 2 sets of repos that are not necessarily in sync. I don't know the practical impact of that, maybe it is not significant.
As much as I'd like to avoid duplicating individual repos, that seems like a harder problem than only updating repos once per invocation of helmfile
, and I don't want the difficulty of fixing this harder problem to delay fixing the easier one, especially since the easier one is much more impactful.
Another downside is that users using both helmfile and helm will have 2 sets of repos that are not necessarily in sync.
I believe you are correct.
I don't know the practical impact of that, maybe it is not significant.
One of the potential impacts would be that helm
used outside of helmfile
suffers from longer helm repo up
run. However, I believe it can be solved simply by helmfile
setting an alternative HELM_HOME
:
$ helm home
/Users/mumoshu/.helm
$ HELM_HOME=~/.helmfile/ helm home
/Users/mumoshu/.helmfile/
I'd appreciate if you have other relevant cases but if that's the only problem I'm fine with this helmfile-specific HELM_HOME
option.
I don't want the difficulty of fixing this harder problem to delay fixing the easier one, especially since the easier one is much more impactful.
That's completely okay! If you could submit a pull request for the easier one, I will add improvements on top.
Normally I'm happy to submit PRs, but I'm still very new to Go, so I don't think I will be able to generate a PR I'm comfortable with for quite a while. Creating a new processing pass for helmfiles is simple in theory, but in practice, properly integrating it intohelmfile
requires a much deeper understanding of how it works than I have. And then I don't know how best to stop helm
from automatically updating repos in later stages.
I will give it a try, and maybe it will be easier than I expect, but please don't wait for me.
If we use a single
helmfile.yaml
that has ahelmfiles
section to refer to a bunch of other helmfiles, when we runhelmfile deps
it updates all the repos once for each referenced helmfile. This is a big waste of time considering that we have 20 helmfiles and 15 of them reference either thestable
orincubator
. Granted we have made the problem a little worse by using public charts that sometimes refer to https://kubernetes-charts-incubator.storage.googleapis.com askubernetes-incubator
and sometimes asincubator
, which then get treated like 2 separate repositories, but still it is a big pain and perhaps a source of several of the other feature requests suggesting ways to limitdeps
and repo updates.Currently, for every helmfile, every repos gets updated, so we get 20 * 2 = 40 updates just of
incubator
. It would probably take 10 minutes off ourhelm deps
execution time if, in a singlehelm deps
execution, each repo would only be updated once.