rancher / fleet

Deploy workloads from Git to large fleets of Kubernetes clusters
https://fleet.rancher.io/
Apache License 2.0
1.47k stars 216 forks source link

GitJob pod panics on wrongly formatted semver tag with metadata #2541

Closed puffitos closed 5 hours ago

puffitos commented 1 week ago

Is there an existing issue for this?

Current Behavior

When using an quay OCI registry to roll out a helm chart with fleet, and the helm chart has a semver tag, which also includes metadata (like 0.0.11-rc4+build3), using the exact version of the chart as saved in the OCI registry will result in a panic of the GitJob pod which runs the fleet apply command.

The chart itself is has the following Chart.yaml:

name: my-cool-chart
version: "0.0.11-rc4+build3"

And packaging the helm chart and pushing it to an OCI registry produces the following tag:

# in the chart's repo
❯ helm package chart
❯ helm push my-cool-chart-0.0.11-rc4+build3.tgz oci://quay.based.registry/charts/
❯ docker pull oci://quay.based.registry/charts/my-cool-chart:0.0.11-rc4_build3
0.0.11-rc4_build3: Pulling from quay.based.registry/charts/my-cool-chart
unsupported media type application/vnd.cncf.helm.config.v1+json
❯ docker pull quay.based.registry/charts/my-cool-chart:0.0.11-rc4+build3
invalid reference format

Since the image cannot be pulled using the + sign, one would think that the correct fleet.yaml to pull in the chart would be:

helm:
  releaseName: cool-release
  chart: oci://quay.based.registry/charts/my-cool-chart
  valuesFiles:
    - values.yaml
  version: 0.0.11-rc4_build3

since that's how the chart is saved in the registry. Alas, this creates the panic.

Replacing the _ with a +, does the trick:

helm:
  releaseName: cool-release
  chart: oci://quay.based.registry/charts/my-cool-chart
  valuesFiles:
    - values.yaml
  version: 0.0.11-rc4+build3

The only problem is, that the panic of the GitJob pod isn't helping anyone understand why this happens.

Expected Behavior

After using a wrong tag for a helm chart, a good error message would appear in the the output of the gitjob pod, explaining that the tag isn't present / malformed.

Steps To Reproduce

See current behavior. Simply put:

  1. Have a chart with a semver with metadata (v0.0.1-rc0+build9) get pushed to an OCI registry via helm package + helm push
  2. Create a manifest repo with a fleet.yaml to deploy the slightly changed helm chart version (v0.0.1-rc0_build9)
  3. Create a GitRepo to deploy the helmchart via the git repository
  4. Experience much panic in the gitjob pod

Environment

- Architecture: amd64
- Fleet Version: 0.9.x
- Cluster: 
  - Provider: rke
  - Options: irrelevant
  - Kubernetes Version: 1.27.11

Logs

_=/usr/bin/env
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1719eeb]

goroutine 13 [running]:
helm.sh/helm/v3/pkg/registry.(*Client).Tags(0x0, {0xc00026da06?, 0xc000d39450?})
    /go/pkg/mod/github.com/rancher/helm/v3@v3.12.3-fleet1/pkg/registry/client.go:628 +0x12b
helm.sh/helm/v3/pkg/downloader.(*ChartDownloader).getOciURI(0xc000d39940, {0xc00026da00, 0x3f}, {0xc00099b5c0, 0x11}, 0xc000124630)
    /go/pkg/mod/github.com/rancher/helm/v3@v3.12.3-fleet1/pkg/downloader/chart_downloader.go:154 +0x128
helm.sh/helm/v3/pkg/downloader.(*ChartDownloader).ResolveChartVersion(0xc000d39940, {0xc00026da00, 0x3f}, {0xc00099b5c0, 0x11})
    /go/pkg/mod/github.com/rancher/helm/v3@v3.12.3-fleet1/pkg/downloader/chart_downloader.go:199 +0x1036
helm.sh/helm/v3/pkg/downloader.(*ChartDownloader).DownloadTo(0xc000d39940, {0xc00026da00, 0x3f}, {0xc00099b5c0?, 0xc000d9239a?}, {0xc000d92390, 0x14})
    /go/pkg/mod/github.com/rancher/helm/v3@v3.12.3-fleet1/pkg/downloader/chart_downloader.go:90 +0x4f
github.com/rancher/fleet/internal/bundlereader.downloadOCIChart({0xc00026da00, 0x3f}, {0xc00099b5c0, 0x11}, {0xc000d92390, 0x14}, {{0x0, 0x0}, {0x0, 0x0}, ...})
    /go/src/github.com/rancher/fleet/internal/bundlereader/loaddirectory.go:341 +0x4aa
github.com/rancher/fleet/internal/bundlereader.GetContent({0x2a6e690, 0xc000834140}, {0x7ffc5a695cf9, 0x2}, {0xc00026da00, 0x3f}, {0xc00099b5c0, 0x11}, {{0x0, 0x0}, ...})
    /go/src/github.com/rancher/fleet/internal/bundlereader/loaddirectory.go:197 +0x1aa
github.com/rancher/fleet/internal/bundlereader.loadDirectory({0x2a6e690?, 0xc000834140?}, 0x0, {0xc00005c8c0, 0x47}, {0x7ffc5a695cf9?, 0x0?}, {0xc00026da00?, 0x0?}, {0xc00099b5c0, ...}, ...)
    /go/src/github.com/rancher/fleet/internal/bundlereader/loaddirectory.go:157 +0x94
github.com/rancher/fleet/internal/bundlereader.loadDirectories.func1()
    /go/src/github.com/rancher/fleet/internal/bundlereader/resources.go:219 +0x105
golang.org/x/sync/errgroup.(*Group).Go.func1()
    /go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1
    /go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:72 +0x96
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1719eeb]

goroutine 11 [running]:
helm.sh/helm/v3/pkg/registry.(*Client).Tags(0x0, {0xc00026da06?, 0xc000d35450?})
    /go/pkg/mod/github.com/rancher/helm/v3@v3.12.3-fleet1/pkg/registry/client.go:628 +0x12b
helm.sh/helm/v3/pkg/downloader.(*ChartDownloader).getOciURI(0xc000d35940, {0xc00026da00, 0x3f}, {0xc00099b5c0, 0x11}, 0xc0005c0090)
    /go/pkg/mod/github.com/rancher/helm/v3@v3.12.3-fleet1/pkg/downloader/chart_downloader.go:154 +0x128
helm.sh/helm/v3/pkg/downloader.(*ChartDownloader).ResolveChartVersion(0xc000d35940, {0xc00026da00, 0x3f}, {0xc00099b5c0, 0x11})
    /go/pkg/mod/github.com/rancher/helm/v3@v3.12.3-fleet1/pkg/downloader/chart_downloader.go:199 +0x1036
helm.sh/helm/v3/pkg/downloader.(*ChartDownloader).DownloadTo(0xc000d35940, {0xc00026da00, 0x3f}, {0xc00099b5c0?, 0xc000500800?}, {0xc000bc2018, 0x14})
    /go/pkg/mod/github.com/rancher/helm/v3@v3.12.3-fleet1/pkg/downloader/chart_downloader.go:90 +0x4f
github.com/rancher/fleet/internal/bundlereader.downloadOCIChart({0xc00026da00, 0x3f}, {0xc00099b5c0, 0x11}, {0xc000bc2018, 0x14}, {{0x0, 0x0}, {0x0, 0x0}, ...})
    /go/src/github.com/rancher/fleet/internal/bundlereader/loaddirectory.go:341 +0x4aa
github.com/rancher/fleet/internal/bundlereader.GetContent({0x2a6e690, 0xc000834140}, {0x7ffc5a695cf9, 0x2}, {0xc00026da00, 0x3f}, {0xc00099b5c0, 0x11}, {{0x0, 0x0}, ...})
    /go/src/github.com/rancher/fleet/internal/bundlereader/loaddirectory.go:197 +0x1aa
github.com/rancher/fleet/internal/bundlereader.loadDirectory({0x2a6e690?, 0xc000834140?}, 0x0, {0xc00005c4b0, 0x47}, {0x7ffc5a695cf9?, 0x0?}, {0xc00026da00?, 0x0?}, {0xc00099b5c0, ...}, ...)
    /go/src/github.com/rancher/fleet/internal/bundlereader/loaddirectory.go:157 +0x94
github.com/rancher/fleet/internal/bundlereader.loadDirectories.func1()
    /go/src/github.com/rancher/fleet/internal/bundlereader/resources.go:219 +0x105
golang.org/x/sync/errgroup.(*Group).Go.func1()
    /go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1
    /go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:72 +0x96
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x38 pc=0x1719eeb]

goroutine 12 [running]:
helm.sh/helm/v3/pkg/registry.(*Client).Tags(0x0, {0xc00026da06?, 0xc000147450?})
    /go/pkg/mod/github.com/rancher/helm/v3@v3.12.3-fleet1/pkg/registry/client.go:628 +0x12b
helm.sh/helm/v3/pkg/downloader.(*ChartDownloader).getOciURI(0xc000147940, {0xc00026da00, 0x3f}, {0xc00099b5c0, 0x11}, 0xc0004cb4d0)
    /go/pkg/mod/github.com/rancher/helm/v3@v3.12.3-fleet1/pkg/downloader/chart_downloader.go:154 +0x128
helm.sh/helm/v3/pkg/downloader.(*ChartDownloader).ResolveChartVersion(0xc000147940, {0xc00026da00, 0x3f}, {0xc00099b5c0, 0x11})
    /go/pkg/mod/github.com/rancher/helm/v3@v3.12.3-fleet1/pkg/downloader/chart_downloader.go:199 +0x1036
helm.sh/helm/v3/pkg/downloader.(*ChartDownloader).DownloadTo(0xc000147940, {0xc00026da00, 0x3f}, {0xc00099b5c0?, 0xc00007bc00?}, {0xc000a0e048, 0x14})
    /go/pkg/mod/github.com/rancher/helm/v3@v3.12.3-fleet1/pkg/downloader/chart_downloader.go:90 +0x4f
github.com/rancher/fleet/internal/bundlereader.downloadOCIChart({0xc00026da00, 0x3f}, {0xc00099b5c0, 0x11}, {0xc000a0e048, 0x14}, {{0x0, 0x0}, {0x0, 0x0}, ...})
    /go/src/github.com/rancher/fleet/internal/bundlereader/loaddirectory.go:341 +0x4aa
github.com/rancher/fleet/internal/bundlereader.GetContent({0x2a6e690, 0xc000834140}, {0x7ffc5a695cf9, 0x2}, {0xc00026da00, 0x3f}, {0xc00099b5c0, 0x11}, {{0x0, 0x0}, ...})
    /go/src/github.com/rancher/fleet/internal/bundlereader/loaddirectory.go:197 +0x1aa
github.com/rancher/fleet/internal/bundlereader.loadDirectory({0x2a6e690?, 0xc000834140?}, 0x0, {0xc00005c730, 0x47}, {0x7ffc5a695cf9?, 0x0?}, {0xc00026da00?, 0x0?}, {0xc00099b5c0, ...}, ...)
    /go/src/github.com/rancher/fleet/internal/bundlereader/loaddirectory.go:157 +0x94
github.com/rancher/fleet/internal/bundlereader.loadDirectories.func1()
    /go/src/github.com/rancher/fleet/internal/bundlereader/resources.go:219 +0x105
golang.org/x/sync/errgroup.(*Group).Go.func1()
    /go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75 +0x56
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1
    /go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:72 +0x96


### Anything else?

The error occurs because of faulty user input,  but fleet should inform the user of their wrongdoing and most of all, not panic :) Hopefully, the error can be caught before the final call of the helm module, which probably created the panic.
0xavi0 commented 1 day ago

I can't recreate this one with the latest from main. fleet's version of helm was updated to v3.14.4-fleet1 (the one used when reporting the bug was v3.12.3-fleet1)

I can confirm I get the following error instead of the previous panic:

 fleet time="2024-07-01T13:36:01Z" level=fatal msg="helm chart download: improper constraint: 0.0.11-rc4_build3"

In fact, when pushing the chart to the OCI registry helm now shows a hint about what's going on:

OCI artifact references (e.g. tags) do not support the plus sign (+). To support
storing semantic versions, Helm adopts the convention of changing plus (+) to
an underscore (_) in chart version tags when pushing to a registry and back to
a plus (+) when pulling from a registry.

Basically the + character is not supported by OCI, but helm (and fleet) works with semantic versions (which don't support _), so helm changes the _ characters to + when pulling and the other way around when pushing.

I agree it's not intuitive to see the manifest as 0.0.11-rc4_build3 uploaded in the registry and having to specify 0.0.11-rc4+build3 in the fleet.yaml file. But this is something done by the helm library. As the _ version is not a valid semantic version it is rejected. And the + version is stored as _ because + is not a valid character for OCI tags and helm swaps the character automatically.

I hope OCI specs are updated to accept + because this is really confusing.

We could , maybe, be more explicit in the documentation about semantic versioning and OCI charts. I won't close the issue yet, until we decide about this docs topic.

0xavi0 commented 5 hours ago

Closing this as the docs update has been merged.