pulumi / pulumi-kubernetes-operator

A Kubernetes Operator that automates the deployment of Pulumi Stacks
Apache License 2.0
218 stars 55 forks source link

Pulumi Operator times out grabbing GitRepository from FluxSource #469

Open kellervater opened 1 year ago

kellervater commented 1 year ago

What happened?

I tried to integrate a Pulumi Program written in Go via a Flux source like described in here: https://www.pulumi.com/blog/pulumi-kubernetes-new-2022/#integration-with-flux-sources

Unfortunately this is failing due to a timeout-issue. The pulumi-operator times out grabbing the GitRepository from the Flux-Source-Controller like this:

{"level":"error","ts":"2023-07-15T11:46:44.275Z","logger":"controller_stack","msg":"Failed to setup Pulumi workdir","Request.Namespace":"pulumi-operator","Request.Name":"asgard-tst","Stack.Name":"tst","error":"failed to get artifact from source: failed to download archive, error: GET http://source-controller.flux-system.svc.cluster.local./gitrepository/pulumi-operator/iac-asgard/044c892e78dd4f774bd349c31db790269ecc8f32.tar.gz giving up after 2 attempt(s): Get \"http://source-controller.flux-system.svc.cluster.local./gitrepository/pulumi-operator/iac-asgard/044c892e78dd4f774bd349c31db790269ecc8f32.tar.gz\": dial tcp 10.43.196.204:80: i/o timeout","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/internal/controller/controller.go:214"}
{"level":"error","ts":"2023-07-15T11:46:44.275Z","logger":"controller_stack","msg":"Failed to update Stack","Request.Namespace":"pulumi-operator","Request.Name":"asgard-tst","Stack.Name":"tst","error":"failed to get artifact from source: failed to download archive, error: GET http://source-controller.flux-system.svc.cluster.local./gitrepository/pulumi-operator/iac-asgard/044c892e78dd4f774bd349c31db790269ecc8f32.tar.gz giving up after 2 attempt(s): Get \"http://source-controller.flux-system.svc.cluster.local./gitrepository/pulumi-operator/iac-asgard/044c892e78dd4f774bd349c31db790269ecc8f32.tar.gz\": dial tcp 10.43.196.204:80: i/o timeout","stacktrace":"github.com/pulumi/pulumi-kubernetes-operator/pkg/controller/stack.(*ReconcileStack).Reconcile\n\t/home/runner/work/pulumi-kubernetes-operator/pulumi-kubernetes-operator/pkg/controller/stack/stack_controller.go:687\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/internal/controller/controller.go:298\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/internal/controller/controller.go:253\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/home/runner/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.9.0/pkg/internal/controller/controller.go:214"}

I was able to overcome this issue by adding this exact network policy for the flux source-controller:

---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-pulumi-operator-source-grabbing
  namespace: flux-system
spec:
  podSelector:
    matchLabels:
      app: source-controller
  ingress:
    - ports:
        - protocol: TCP
          port: http
      from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: pulumi-operator         
        - podSelector:
            matchLabels:
              name: pulumi-kubernetes-operator
  policyTypes:
    - Ingress

I'm not sure if my "outdated" flux or pulumi are the reason for this behavior, but they are set as follows: Pulumi version in cluster: v3.68.0 Flux version: v0.41.2 (current: v2.0.1)

Expected Behavior

a chapter in the documentation describing on how to add the above network policy or a built-in way which does it for you.

Steps to reproduce

Just follow the documentation in here https://www.pulumi.com/blog/pulumi-kubernetes-new-2022/#integration-with-flux-sources with the above mentioned pulumi/flux versions.

Output of pulumi about

From within the pulumi-operator pod: image

Additional context

No response

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

rquitales commented 1 year ago

@kellervater Thanks for reporting this issue and providing the fix for this. You are indeed correct that a NetworkPolicy to enable ingress to Flux is required for the Pulumi Operator to interact with Flux's source-controller.

Looking at the examples we have in our Operator codebase, we explicitly enable ingress to Flux components when we install Flux using the Pulumi Flux provider. The default installation of Flux does not allow ingress, which likely explains the initial error you experienced.

I'll ensure that this requirement is documented somewhere. Thanks for bringing this up once again!