Open kloudwrangler opened 8 months ago
@kloudwrangler
Looking at the log message:
time="2024-03-20T13:58:18Z" level=info msg="Deleting failed job to trigger retry fleet-default/logging-operator-a494d due to: time=\"2024-03-20T13:58:15Z\" level=fatal msg=\"Helm chart download: failed to do request: Head \\\"https://ghcr.io/v2/kube-logging/helm-charts/logging-operator/manifests/4.5.6\\\": dial tcp 4.208.26.196:443: i/o timeout\"\n"
While the chart URL is oci://
the API requests are still made via HTTP, and the URL looks correct.
I note the dial tcp 4.208.26.196:443: i/o timeout
part of the log, is there any reason that your Fleet installation wouldn't be able to get to ghcr.io?
While the chart URL is oci:// the API requests are still made via HTTP, and the URL looks correct.
Thanks for this info @bigkevmcd as I can now concentrate on the real issue. I do have an http proxy that it might be ignoring. I would like to keep this ticket open for a few days to ensure that this is in fact operator error.
Okay, I got more information. As I mentioned before, I have an http proxy and I also have a harbor instance that I am able to pull the oci helm chart manually. I can even install this chart as normal through helm install
using the harbor oci address which is "oci://<harbor-site>/ghcr-proxy/kube-logging/helm-charts/logging-operator"
However, when trying to do this with fleet, I change the repo
to chart: "oci://<harbor-site>/ghcr-proxy/kube-logging/helm-charts/logging-operator"
it gives me the following:
time="2024-03-21T14:00:07Z" level=fatal msg="Helm chart download: failed to copy: httpReadSeeker: failed open: failed to do request: Get \"https://rancher-harbor.s3.eu-west-1.amazonaws.com/docker/registry/v2/blobs/sha256/00/008a6a5af64de4d20014eeb03e613109246cb8fb60943365469ebc96ef4934d6/data?X-Amz-Algorithm=AWS4xxxxxxx": dial tcp 52.218.30.128:443: i/o timeout"
I guess what is happening is that it is changing the address to some kind of rancher default address. I am wondering if this is normal, if I am able to add my harbor instance as a repo or something.
It's not clear to me how this works, I'm guessing that your proxy intercepts the /ghcr-proxy
/ request?
Have you configured fleet for use behind a proxy? https://ranchermanager.docs.rancher.com/v2.8/integrations-in-rancher/fleet/use-fleet-behind-a-proxy
The URL that it's requesting looks like the standard OCI API endpoint for getting an artifact.
https://github.com/opencontainers/distribution-spec/blob/main/spec.md#pulling-blobs
Fleet uses the Helm packages to fetch the artifacts, so it should be working just as Helm does, which is why I suspect a Proxy misconfiguration somewhere?
@bigkevmcd
Hey, we have a much simpler setup, but probably hit the same problem:
All the other helm charts (non OCI repositories) are working fine, just 2 = grafana operator and kafka operator using OCI don't work:
fleet time="2024-04-08T12:45:07Z" level=fatal msg="Helm chart download: failed to do request: Head \"https://ghcr.io/v2/grafana/helm-charts/grafana-operator/manifests/v5.7.0\": dial tcp 140.82.121.33:443: i/o timeout"
As can be seen, the gitjob pod is trying a direct connection instead of going through the proxy, though HTTP_PROXY, HTTPS_PROXY, and NO_PROXY environment variable are set.
Strangely when using Rancher 2.7.9 / fleet 0.8.x the OCI charts were installed correctly behind proxy.
Proxy support was fixed for 2.8.3. https://github.com/rancher/fleet/issues/2000
It seems the OCI downloader is separate. I think the following upstream issue applies: https://github.com/helm/helm/issues/12770
You probably can't change the proxy env vars to lowercase, since Fleet sets them.
This could be fixed by bumping the Helm SDK to 3.14.4 tomorrow.
Although not mentioned in the original comment, we've created a copy of the gitjob pod with lowercase proxy environment present, but that didn't help.
Seems like a regression, as with fleet 0.8.x the OCI charts work flawlessly through proxy (no direct connectivity).
Is there an existing issue for this?
Current Behavior
I want to install the upstream logging operator using the following
fleet.yaml
I then create a gitrepo with only this definition.
Fleet fails to create a bundle from this as the gitjob logs have the following error.
As you can see from the log, its ignoring the oci and taking the address as regular helm repo.
Expected Behavior
I would expect the oci address to be kept the same in the log and for fleet to try to actually download the oci chart.
Steps To Reproduce
Install the upstream logging operator using the following
fleet.yaml
I then create a gitrepo with only this definition.
Environment
Logs
No response
Anything else?
No response