pulumi / pulumi-kubernetes

A Pulumi resource provider for Kubernetes to manage API resources and workloads in running clusters
https://www.pulumi.com/docs/reference/clouds/kubernetes/
Apache License 2.0
398 stars 113 forks source link

Server-side apply deletes resource if provider changes #2370

Open tusharshahrs opened 1 year ago

tusharshahrs commented 1 year ago

I've got a simpler reproduction and explanation, but I'll leave the original issue text below for historical context.

What happened?

Resource replacement erroneously deletes the resource when the following conditions are true:

  1. Server-side apply mode is enabled
  2. The resource provider changes to another provider pointing at the same cluster (e.g., a default provider changes to an explicit provider)

This does not happen in Client-side apply mode because the resource creation fails during replacement due to the resource already existing.

In Server-side apply mode, the create succeeds even if the resource existed previously. This behavior was intended to support "upsert" behavior. Unfortunately, the "delete" portion of the resource replacement now removes the resource, causing it to be deleted from the cluster. (deleteBeforeReplace still works as expected, but is not an option for CustomResourceDefinition resources, since this will also delete related CustomResource resources on the cluster.)

Further complicating this problem, the engine, rather than the provider, plans the resource replacement based on the provider changing. Resource aliases do not currently support aliasing providers. It is possible to manually drop the resource from state and then import it with the new provider, but this requires changes outside of the program code, so it may not be an option for some users.

Expected Behavior

The provider should not delete the resource that it is expected to replace.

Steps to reproduce

  1. pulumi config set kubernetes:enableServerSideApply true
  2. pulumi up the following program
    
    import * as k8s from "@pulumi/kubernetes";

const provider = new k8s.Provider("k8s", { enableServerSideApply: true, }); new k8s.core.v1.ConfigMap("test", { metadata: {name: "test"}, data: {foo: "bar"} }, { provider, });

3. Comment out the `provider` option on the `ConfigMap`
4. `pulumi up` shows that the resource will be replaced, but it will actually delete the resource
5. Running `pulumi up --refresh` detects that the `ConfigMap` is missing, and recreates it

### Output of `pulumi about`

Dependencies: NAME VERSION @pulumi/pulumi 3.61.0 @pulumi/kubernetes 3.23.0


Related https://github.com/pulumi/pulumi/issues/9925
Related https://github.com/pulumi/pulumi-aws/issues/1923#issuecomment-1184890624

---

### What happened?

We pass in `{ provider: k8sProvider}` as part of a `crd` in the options part(last line of code): ...`const foobarcrd  = new k8s.apiextensions.v1.CustomResourceDefinition(foobarcrd)`.  Then if we remove the `{ provider: k8sProvider}`, pulumi preview shows:

View in Browser (Ctrl+O): https://app.pulumi.com/tushar-pulumi-corp/aws-classic-ts-eks-spot-mg/dev/previews/259d842b-1a54-410c-83f8-7a0420e07b58

 Type                                                            Name                            Plan        Info
 pulumi:pulumi:Stack                                             aws-classic-ts-eks-spot-mg-dev              

+- └─ kubernetes:apiextensions.k8s.io/v1:CustomResourceDefinition shaht-foobarcrd replace [diff: ~provider]

Resources: +-1 to replace 60 unchanged

because of the `diff` in the provider

The `diff` shows this:

++kubernetes:apiextensions.k8s.io/v1:CustomResourceDefinition: (create-replacement) [id=foobars.stable.example.com] [urn=urn:pulumi:dev::aws-classic-ts-eks-spot-mg::kubernetes:apiextensions.k8s.io/v1:CustomResourceDefinition::shaht-foobarcrd] [provider: urn:pulumi:dev::aws-classic-ts-eks-spot-mg::pulumi:providers:kubernetes::shaht-k8sprovider::acfa6c62-f920-40a3-9587-61e4ab3b1c33 => urn:pulumi:dev::aws-classic-ts-eks-spot-mg::pulumi:providers:kubernetes::default_3_25_0::output]

+-kubernetes:apiextensions.k8s.io/v1:CustomResourceDefinition: (replace) [id=foobars.stable.example.com] [urn=urn:pulumi:dev::aws-classic-ts-eks-spot-mg::kubernetes:apiextensions.k8s.io/v1:CustomResourceDefinition::shaht-foobarcrd]


and if we select `yes` we get the following error:

View in Browser (Ctrl+O): https://app.pulumi.com/tushar-pulumi-corp/aws-classic-ts-eks-spot-mg/dev/updates/84

 Type                                                            Name                            Status                   Info
 pulumi:pulumi:Stack                                             aws-classic-ts-eks-spot-mg-dev  **failed**               1 error
 ├─ eks:index:Cluster                                            shaht-eks                                                
 │  └─ aws:eks:Cluster                                           shaht-eks-eksCluster                                     

+- └─ kubernetes:apiextensions.k8s.io/v1:CustomResourceDefinition shaht-foobarcrd replacing failed [diff: ~provider]; 1 error

Diagnostics: pulumi:pulumi:Stack (aws-classic-ts-eks-spot-mg-dev): error: update failed

kubernetes:apiextensions.k8s.io/v1:CustomResourceDefinition (shaht-foobarcrd): error: resource foobars.stable.example.com was not successfully created by the Kubernetes API server : customresourcedefinitions.apiextensions.k8s.io "foobars.stable.example.com" already exists

Outputs:

Expected Behavior

The crd should be replaced even if it is across different providers.

Steps to reproduce

  1. Clone this repro: git@github.com:tusharshahrs/pulumi-home.git
  2. cd aws-classic-ts-eks-spot-mg
  3. pulumi stack init dev
  4. npm install
  5. pulumi config set aws:region us-east-2 # any valid aws region
  6. pulumi up -y
  7. validate that the crd is there: pulumi stack output my_crd expected output: foobars.stable.example.com
  8. Comment out the provider on line 105: https://github.com/tusharshahrs/pulumi-home/blob/identityprofile/aws-classic-ts-eks-spot-mg/index.ts#L105 like this: // { provider: k8sProvider, dependsOn: [namespace, mycluster]});
  9. Uncomment out the line below for the default provider to get picked up: https://github.com/tusharshahrs/pulumi-home/blob/identityprofile/aws-classic-ts-eks-spot-mg/index.ts#L105 like this: {dependsOn: [namespace, mycluster]});
  10. pulumi up and see that the preview will show that the crd is being replaced

    
    View in Browser (Ctrl+O): https://app.pulumi.com/tushar-pulumi-corp/aws-classic-ts-eks-spot-mg/dev/previews/d642dc85-49ad-4e41-9149-b65f964790c6
    
     Type                                                            Name                            Plan        Info
     pulumi:pulumi:Stack                                             aws-classic-ts-eks-spot-mg-dev              
    +-  └─ kubernetes:apiextensions.k8s.io/v1:CustomResourceDefinition  shaht-foobarcrd                 replace     [diff: ~provider]

Resources: +-1 to replace 60 unchanged

Do you want to perform this update? details pulumi:pulumi:Stack: (same) [urn=urn:pulumi:dev::aws-classic-ts-eks-spot-mg::pulumi:pulumi:Stack::aws-classic-ts-eks-spot-mg-dev] ++kubernetes:apiextensions.k8s.io/v1:CustomResourceDefinition: (create-replacement) [id=foobars.stable.example.com] [urn=urn:pulumi:dev::aws-classic-ts-eks-spot-mg::kubernetes:apiextensions.k8s.io/v1:CustomResourceDefinition::shaht-foobarcrd] [provider: urn:pulumi:dev::aws-classic-ts-eks-spot-mg::pulumi:providers:kubernetes::shaht-k8sprovider::acfa6c62-f920-40a3-9587-61e4ab3b1c33 => urn:pulumi:dev::aws-classic-ts-eks-spot-mg::pulumi:providers:kubernetes::default_3_25_0::output] apiVersion: "apiextensions.k8s.io/v1" kind : "CustomResourceDefinition" metadata : { labels: { app.kubernetes.io/managed-by: "pulumi" } name : "foobars.stable.example.com" } spec : { group : "stable.example.com" names : { kind : "FooBar" plural : "foobars" shortNames: [

            ]
            singular  : "foobar"
        }
        scope   : "Namespaced"
        versions: [
            [0]: {
                name   : "v1"
                schema : {
                    openAPIV3Schema: {
                        type: "object"
                    }
                }
                served : true
                storage: true
            }
        ]
    }
+-kubernetes:apiextensions.k8s.io/v1:CustomResourceDefinition: (replace)
    [id=foobars.stable.example.com]
    [urn=urn:pulumi:dev::aws-classic-ts-eks-spot-mg::kubernetes:apiextensions.k8s.io/v1:CustomResourceDefinition::shaht-foobarcrd]
    [provider: urn:pulumi:dev::aws-classic-ts-eks-spot-mg::pulumi:providers:kubernetes::shaht-k8sprovider::acfa6c62-f920-40a3-9587-61e4ab3b1c33 => urn:pulumi:dev::aws-classic-ts-eks-spot-mg::pulumi:providers:kubernetes::default_3_25_0::output<string>]
  ~ metadata  : {
      - creationTimestamp: "2023-04-27T12:27:25Z"
      - generation       : 1
      - managedFields    : [
      -     [0]: {
              - apiVersion: "apiextensions.k8s.io/v1"
              - fieldsType: "FieldsV1"
              - fieldsV1  : {
                  - f:status: {
                      - f:acceptedNames: {
                          - f:kind      : {}
                          - f:listKind  : {}
                          - f:plural    : {}
                          - f:shortNames: {}
                          - f:singular  : {}
                        }
                      - f:conditions   : {
                          - k:{"type":"Established"}  : {
                              - .                   : {}
                              - f:lastTransitionTime: {}
                              - f:message           : {}
                              - f:reason            : {}
                              - f:status            : {}
                              - f:type              : {}
                            }
                          - k:{"type":"NamesAccepted"}: {
                              - .                   : {}
                              - f:lastTransitionTime: {}
                              - f:message           : {}
                              - f:reason            : {}
                              - f:status            : {}
                              - f:type              : {}
                            }
                        }
                    }
                }
              - manager   : "kube-apiserver"
              - operation : "Update"
              - time      : "2023-04-27T12:27:25Z"
            }
      -     [1]: {
              - apiVersion: "apiextensions.k8s.io/v1"
              - fieldsType: "FieldsV1"
              - fieldsV1  : {
                  - f:metadata: {
                      - f:labels: {
                          - .                             : {}
                          - f:app.kubernetes.io/managed-by: {}
                        }
                    }
                  - f:spec    : {
                      - f:conversion: {
                          - .         : {}
                          - f:strategy: {}
                        }
                      - f:group     : {}
                      - f:names     : {
                          - f:kind      : {}
                          - f:listKind  : {}
                          - f:plural    : {}
                          - f:shortNames: {}
                          - f:singular  : {}
                        }
                      - f:scope     : {}
                      - f:versions  : {}
                    }
                }
              - manager   : "pulumi-kubernetes"
              - operation : "Update"
              - time      : "2023-04-27T12:27:25Z"
            }
        ]
      - resourceVersion  : "1085"
      - uid              : "3d55882f-239e-418c-a2ec-416f4b407b70"
    }
  ~ spec      : {
      - conversion: {
          - strategy: "None"
        }
      ~ names     : {
          - listKind  : "FooBarList"
        }
    }
  - status    : {
      - acceptedNames : {
          - kind      : "FooBar"
          - listKind  : "FooBarList"
          - plural    : "foobars"
          - shortNames: [
          -     [0]: "fb"
            ]
          - singular  : "foobar"
        }
      - conditions    : [
      -     [0]: {
              - lastTransitionTime: "2023-04-27T12:27:25Z"
              - message           : "no conflicts found"
              - reason            : "NoConflicts"
              - status            : "True"
              - type              : "NamesAccepted"
            }
      -     [1]: {
              - lastTransitionTime: "2023-04-27T12:27:25Z"
              - message           : "the initial names have been accepted"
              - reason            : "InitialNamesAccepted"
              - status            : "True"
              - type              : "Established"
            }
        ]
      - storedVersions: [
      -     [0]: "v1"
        ]
    }
--kubernetes:apiextensions.k8s.io/v1:CustomResourceDefinition: (delete-replaced)
    [id=foobars.stable.example.com]
    [urn=urn:pulumi:dev::aws-classic-ts-eks-spot-mg::kubernetes:apiextensions.k8s.io/v1:CustomResourceDefinition::shaht-foobarcrd]
    [provider=urn:pulumi:dev::aws-classic-ts-eks-spot-mg::pulumi:providers:kubernetes::shaht-k8sprovider::acfa6c62-f920-40a3-9587-61e4ab3b1c33]
15. Select `yes` and the following error will show up:

+- └─ kubernetes:apiextensions.k8s.io/v1:CustomResourceDefinition shaht-foobarcrd replacing failed [diff: ~provider]; 1 error

Diagnostics: pulumi:pulumi:Stack (aws-classic-ts-eks-spot-mg-dev): error: update failed

kubernetes:apiextensions.k8s.io/v1:CustomResourceDefinition (shaht-foobarcrd): error: resource foobars.stable.example.com was not successfully created by the Kubernetes API server : customresourcedefinitions.apiextensions.k8s.io "foobars.stable.example.com" already exists

Outputs:

Resources: 60 unchanged

Duration: 9s



### Output of `pulumi about`

❯ pulumi about
CLI          
Version      3.65.1
Go Version   go1.20.3
Go Compiler  gc

Plugins
NAME        VERSION
aws         5.36.1
aws         5.16.2
awsx        1.0.2
docker      3.6.1
eks         1.0.1
kubernetes  3.25.0
nodejs      unknown

Host     
OS       darwin
Version  11.7.6
Arch     x86_64

Found no pending operations associated with dev

Backend        
Name           pulumi.com
URL            https://app.pulumi.com/tushar-..-..
User           tushar-..-..

Dependencies:
NAME                VERSION
@pulumi/aws         5.36.1
@pulumi/awsx        1.0.2
@pulumi/eks         1.0.1
@pulumi/kubernetes  3.25.0
@pulumi/pulumi      3.64.0
@types/node         16.18.24

### Additional context

_No response_

### Contributing

Vote on this issue by adding a 👍 reaction. 
To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already). 
rquitales commented 1 year ago

The issue looks like it's due to our pulumi-kubernetes thinking that a new cluster is being targeted. Hence, it will attempt to create the resources again. When it does so, it uses a imperative create method, that will fail since the cluster already has the resources applied earlier. We should be doing something like "create or update" in events like these, since the failure on creation shouldn't be important if it's due to the resource already existing.

Note that issue not only affects CRDs, but any Kubernetes resources.

We can workaround this issue by using server side apply instead, and it will not error out since a PATCH request is sent instead in SSA mode.

Relevant code chunks: In client side mode, we do a client.Create: https://github.com/pulumi/pulumi-kubernetes/blob/dae363925ac5d7b613afeb2fca1ffb1999c7ee83/provider/pkg/await/await.go#L221

Whereas SSA does a PATCH request: https://github.com/pulumi/pulumi-kubernetes/blob/dae363925ac5d7b613afeb2fca1ffb1999c7ee83/provider/pkg/await/await.go#L206

For a fix to happen/next steps, we would need to implement logic to determine when a provider is being switched and use a patch request instead of a create request.

lblackstone commented 1 year ago

I updated the issue description with new information about the problem.

lblackstone commented 1 year ago

After further investigation, there does not appear to be an easy solution to this bug. The same issue affecting SSA-managed resources also impacts a few other resources.

There are a few possible solutions that require further investigation:

  1. Add support for provider aliasing to tell the engine that two providers target the same thing
  2. Coordinate between a provider to determine if the targeted cloud/cluster is the same, and the engine, which would eliminate the need for an explicit alias
  3. https://github.com/pulumi/pulumi/issues/2625

In the meantime, our recommendation is the following:

  1. A resource's provider should not be changed as part of an update
  2. If the provider needs to be changed, the resource should be dropped from Pulumi state, and then reimported under the new provider.
blampe commented 2 months ago

Related https://github.com/pulumi/pulumi-kubernetes/issues/2948

kwohlfahrt commented 2 months ago

I've also run into this issue following a provider update, due to certificates in the kubeconfig being renewed. I was able to work around it with some stack surgery (manually editing the stack to contain the new kubeconfig, so the change isn't detected).

Unfortunately, I don't think either of recommendations is feasible for us:

A resource's provider should not be changed as part of an update

We need to change the kubeconfig, because the old one is no longer valid (expired certificate).

If the provider needs to be changed, the resource should be dropped from Pulumi state, and then reimported under the new provider.

We have several copies of this stack, each with several hundred resources. Most of the resources have names auto-generated by Pulumi, so we'd also have to find the correct name to import for each stack (possible, but not trivial).

eplightning commented 2 months ago

I think implementing https://github.com/pulumi/pulumi-kubernetes/issues/2745 could help since it would provide a way to stop provider from being replaced

EronWright commented 1 month ago

I'm reassigning to Bryce since he's working on #2745 and that should resolve this one.

bhurlow commented 5 days ago

we're hitting this as well, thanks for the helpful explanations above