pulumi / pulumi-rancher2

A Rancher2 Pulumi resource package, providing multi-language access to Rancher2
Apache License 2.0
9 stars 4 forks source link

rancher2 cluster delete order causing errors #89

Open David-VTUK opened 3 years ago

David-VTUK commented 3 years ago

Hi,

Firstly - thanks a lot for this provider, awesome stuff.

I'm testing with provisioning an RKE cluster on EC2 instances via the use of the following objects:

rancher2.NewCloudCredential rancher2.NewNodeTemplate rancher2.NewCluster rancher2.NewNodePool

in the snippet:


        // Create a Node Template
        nodetemplate, err := rancher2.NewNodeTemplate(ctx, "davidh-pulumi-ec2-medium", &rancher2.NodeTemplateArgs{
            CloudCredentialId: cloudcredential.ID(),
            Description:       pulumi.String("node template for ec2"),
            Name:              pulumi.String("davidh-pulumi-nodetemplate"),
            EngineInstallUrl:  pulumi.String("https://releases.rancher.com/install-docker/19.03.sh"),

            Amazonec2Config: &rancher2.NodeTemplateAmazonec2ConfigArgs{
                Ami:                pulumi.String("ami-0ff4c8fb495a5a50d"),
                IamInstanceProfile: pulumi.String("k8snodes"),
                InstanceType:       pulumi.String("t2.medium"),
                VpcId:              vpc.ID(),
                Tags:               pulumi.String("kubernetes.io/cluster/davidh-cluster:owned"),
                SecurityGroups:     pulumi.StringArray{sg.Name},
                Region:             pulumi.String("eu-west-2"),
            },
        })

        if err != nil {
            return err
        }

        cluster, err := rancher2.NewCluster(ctx, "davidh-pulumi-cluster", &rancher2.ClusterArgs{
            Description: pulumi.String("Cluster created by Pulumi"),
            Driver:      pulumi.String("rancherKubernetesEngine"),
            Name:        pulumi.String("davidh-pulumi-ec2"),
            RkeConfig: &rancher2.ClusterRkeConfigArgs{
                CloudProvider: &rancher2.ClusterRkeConfigCloudProviderArgs{
                    Name: pulumi.String("aws"),
                },
            },
        })

        _, err = rancher2.NewNodePool(ctx, "davidh-pulumi-nodepool", &rancher2.NodePoolArgs{
            ClusterId:      cluster.ID(),
            ControlPlane:   pulumi.Bool(true),
            Etcd:           pulumi.Bool(true),
            HostnamePrefix: pulumi.String("davidh-pulumi-aio-"),
            Name:           pulumi.String("davidh-pulumi-pool"),
            Quantity:       pulumi.Int(3),
            Worker:         pulumi.Bool(true),
            NodeTemplateId: nodetemplate.ID(),
        })

        // End return
        return nil
    })

For provisioning a cluster it works flawlessly. For deleting, it fails:

     Type                            Name                      Status                  Info
     pulumi:pulumi:Stack             pulumi-rancher-aws-demo   **failed**              1 error
 -   ├─ rancher2:index:NodePool      davidh-pulumi-nodepool    deleted                 
 -   ├─ aws:ec2:DefaultRouteTable    default                   deleted                 
 -   └─ rancher2:index:NodeTemplate  davidh-pulumi-ec2-medium  **deleting failed**     1 error
Diagnostics:
  pulumi:pulumi:Stack (pulumi-rancher-aws-demo):
    error: update failed

  rancher2:index:NodeTemplate (davidh-pulumi-ec2-medium):
    error: deleting urn:pulumi:demo::pulumi-rancher-aws::rancher2:index/nodeTemplate:NodeTemplate::davidh-pulumi-ec2-medium: Error removing Node Template: Bad response statusCode [405]. Status [405 Method Not Allowed]. Body: [baseType=error, code=MethodNotAllow, message=Template is in use by a node.] from [https://demo-hosted.rancher.cloud/v3/nodeTemplates/cattle-global-nt:nt-tfjqb]

From what I can see, it tries to delete the NodeTemplate object before the related node pool has deleted, which happens before the entire cluster is deleted.

I think a cleaner way of doing this is to delete the cluster object entirely, which will also remove the node pools.

Pulumi version: v2.19.0 Rancher Version 2.5.5

David-VTUK commented 3 years ago

Been performing some further testing. I had hoped https://registry.terraform.io/providers/rancher/rancher2/latest/docs/resources/cluster_sync would help.

Implemented in the above example as:

_, err = rancher2.NewClusterSync(ctx, "davidh-clustersync", &rancher2.ClusterSyncArgs{
    ClusterId:   cluster.ID(),
    NodePoolIds: pulumi.StringArray{nodepool.ID()},
})

On creation, it's the last resource that's created and doesn't complete until the entire cluster is up (prior, Pulumi would simply state the cluster resource was created before all the nodes came up etc.

 +   pulumi:pulumi:Stack                pulumi-rancher-aws-demo   create     
 +   ├─ aws:ec2:Vpc                     david-pulumi-vpc          create     
 +   ├─ aws:ec2:InternetGateway         gw                        create     
 +   pulumi:pulumi:Stack                pulumi-rancher-aws-demo   create     
 +   ├─ aws:ec2:Subnet                  Subnet-0                  create     
 +   ├─ aws:ec2:Subnet                  Subnet-1                  create     
 +   ├─ aws:ec2:Subnet                  Subnet-2                  create     
 +   ├─ aws:ec2:DefaultRouteTable       default                   create     
 +   ├─ rancher2:index:Cluster          davidh-pulumi-cluster     create     
 +   ├─ rancher2:index:CloudCredential  davidh-pulumi-aws         create     
 +   ├─ rancher2:index:NodeTemplate     davidh-pulumi-ec2-medium  create     
 +   ├─ rancher2:index:NodePool         davidh-pulumi-nodepool    create     
 +   └─ rancher2:index:ClusterSync      davidh-clustersync        create   

Destroy operation:


Do you want to perform this destroy? yes
Destroying (demo)

View Live: https://app.pulumi.com/DH-Rancher/pulumi-rancher-aws/demo/updates/63

     Type                            Name                      Status                  Info
     pulumi:pulumi:Stack             pulumi-rancher-aws-demo   **failed**              1 error
 -   ├─ rancher2:index:ClusterSync   davidh-clustersync        deleted                 
 -   ├─ rancher2:index:NodePool      davidh-pulumi-nodepool    deleted                 
 -   ├─ aws:ec2:DefaultRouteTable    default                   deleted                 
 -   └─ rancher2:index:NodeTemplate  davidh-pulumi-ec2-medium  **deleting failed**     1 error

Diagnostics:
  pulumi:pulumi:Stack (pulumi-rancher-aws-demo):
    error: update failed

  rancher2:index:NodeTemplate (davidh-pulumi-ec2-medium):
    error: deleting urn:pulumi:demo::pulumi-rancher-aws::rancher2:index/nodeTemplate:NodeTemplate::davidh-pulumi-ec2-medium: Error removing Node Template: Bad response statusCode [405]. Status [405 Method Not Allowed]. Body: [message=Template is in use by a node., baseType=error, code=MethodNotAllow] from [https://rancherurl.com/v3/nodeTemplates/cattle-global-nt:nt-bn2fx]
David-VTUK commented 3 years ago

Workaround: Define an explicit dependency between a cluster and nodetemplate

        cluster, err := rancher2.NewCluster(ctx, "davidh-pulumi-cluster", &rancher2.ClusterArgs{
...
        }, pulumi.DependsOn([]pulumi.Resource{nodetemplate}))
jleni commented 2 years ago

Same issue here all the workaround was adding a DependsOn. The code looks different as the API has changed a bit since David-VTUK suggested this.

    // Create cluster
    const cluster = new rancher2.Cluster(this.name, cluster_args, {
      provider: localProvider,
      dependsOn: Array.from(nodeTemplates.values()),
    });
Frassle commented 2 years ago

Given the NodePool is taking nodetemplate.ID() as the input value for NodeTemplateId it should be creating a dependency in the engine anyway, the DependsOn call should be superfluous. Which is suggestive of a tfbridge or core platform issue, I can't see why this would be specific to the rancher2 provider?