radius-project / radius

Radius is a cloud-native, portable application platform that makes app development easier for teams building cloud-native apps.
https://radapp.io
Apache License 2.0
1.51k stars 97 forks source link

Deleting application does not delete resources in some cases #7052

Open tdeheurles opened 10 months ago

tdeheurles commented 10 months ago

Steps to reproduce

ℹ️ This issue is the continuity of discord forum with @vishwahiremat

I have created an environment, application and container in 3 different files. I then have applied them one by one. Finally I decided I didn't want application and container anymore so I run rad application delete myapp. The application seems removed but not the container which was part of it.

Here are some of the files:

import radius as radius

resource environment 'Applications.Core/environments@2023-10-01-preview' = {
  name: 'demo5env'
  properties: {
    compute: {
      kind: 'kubernetes'
      namespace: 'demo5env'
    }
  }
}
import radius as radius

resource myapp 'Applications.Core/applications@2023-10-01-preview' = {
  name: 'myapp'
  properties: {
    environment: '/planes/radius/local/resourcegroups/demo5group/providers/Applications.Core/environments/demo5env'
  }
}
import radius as radius

import kubernetes as kubernetes {
  namespace: 'default-application'
  context: '...'
  kubeConfig: '...'
}

resource backend_container 'Applications.Core/containers@2023-10-01-preview' = {
  name: 'backend'
  properties: {
    application: '/planes/radius/local/resourcegroups/demo5group/providers/Applications.Core/applications/myapp'
    container: {
      ...
    }
  }
}

Observed behavior

The application is deleted while the k8s.deployment is not.

Desired behavior

We should never have orphan resources.

Workaround

To me it seems the problems appears when we deploy the application and the resource with sepparate commands.

But @vishwahiremat thinks it's not the case. For him the issue is linked to the failing container deployment.

I'm surprised as, if I understand correctly, if the application and the container are deployed together, the container is still fialing but deleting the application succeed at deleting the service.

@vishwahiremat, do you mean the "link" is not done as the "deployment of the resource" is failing when deployed separately ?

rad Version

❯ rad version
RELEASE   VERSION   BICEP     COMMIT
edge      16011a5   0.29.0    16011a5da8e7e531da9d59527954fc5336aac044

Operating system

WSL2

Additional context

No response

Would you like to support us?

AB#10954

radius-triage-bot[bot] commented 10 months ago

:wave: @tdeheurles Thanks for filing this bug report.

A project maintainer will review this report and get back to you soon. If you'd like immediate help troubleshooting, please visit our Discord server.

For more information on our triage process please visit our triage overview

tdeheurles commented 10 months ago

⚠️Here is the procedure A which fails. ⚠️

Note that:

Application Deployment

❯ rad deploy application.bicep
Building application.bicep...
Deploying template 'application.bicep' into environment 'demo5env' from workspace 'demo5workspace'...

Deployment In Progress...

Deployment Complete

Resources:
    myapp           Applications.Core/applications

❯ k get ns
NAME                  STATUS   AGE
...
demo5env-myapp        Active   26s
...

Container Deployment

❯ k get all -n demo5env-myapp
No resources found in demo5env-myapp namespace.

❯ rad deploy backend.bicep
Building backend.bicep...
Deploying template 'backend.bicep' into environment 'demo5env' from workspace 'demo5workspace'...

Deployment In Progress...

..                   backend         Applications.Core/containers
Error: {
  "code": "DeploymentFailed",
  "message": "At least one resource deployment operation failed. Please see the details for the specific operation that failed.",
  "target": "/planes/radius/local/resourceGroups/demo5group/providers/Microsoft.Resources/deployments/rad-deploy-314d5b2b-5aa1-43b6-b223-728e3d437f3c",
  "details": [
    {
      "code": "ResourceDeploymentFailure",
      "message": "Failed",
      "target": "/planes/radius/local/resourceGroups/demo5group/providers/Applications.Core/containers/backend",
      "details": [
        {
          "code": "Internal",
          "message": "Container state is 'Terminated' Reason: Error, Message: "
        }
      ]
    }
  ]
}

TraceId:  3254e16c30a69614eb65de701d793ded

--> the deployment fail as the container code is failing

State

❯ rad app connections myapp
Displaying application: myapp

Name: backend (Applications.Core/containers)
Connections: (none)
Resources: (none)

❯ k get all -n demo5env-myapp
NAME                          READY   STATUS             RESTARTS      AGE
pod/backend-c565568bc-5s7gc   0/1     CrashLoopBackOff   4 (12s ago)   92s

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/backend   0/1     1            0           92s

NAME                                DESIRED   CURRENT   READY   AGE
replicaset.apps/backend-c565568bc   1         1         0       92s

Delete procedure

❯ rad app delete myapp
Application myapp deleted

❯ k get ns
NAME                  STATUS   AGE
default               Active   5d23h
default-application   Active   45h
default-rad           Active   45h
demo5env-myapp        Active   3m3s
demo6env              Active   20h
kube-node-lease       Active   5d23h
kube-public           Active   5d23h
kube-system           Active   5d23h
radius-system         Active   4d22h

❯ k get all -n demo5env-myapp
NAME                          READY   STATUS             RESTARTS      AGE
pod/backend-c565568bc-5s7gc   0/1     CrashLoopBackOff   4 (51s ago)   2m11s

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/backend   0/1     1            0           2m11s

NAME                                DESIRED   CURRENT   READY   AGE
replicaset.apps/backend-c565568bc   1         1         0       2m11s
tdeheurles commented 10 months ago

Here is the procedure B which succeed

I moved the container resource into the same bicep file as the application.

Note:

Before deployment state

❯ k delete ns demo5env-myapp
namespace "demo5env-myapp" deleted

❯ rad app show myapp -o json
The application "myapp" was not found or has been deleted.

Deployment

❯ rad deploy application.bicep
Building application.bicep...
Deploying template 'application.bicep' into environment 'demo5env' from workspace 'demo5workspace'...

Deployment In Progress...

Completed            myapp           Applications.Core/applications
..                   backend         Applications.Core/containers

Deployment Complete

Resources:
    myapp           Applications.Core/applications
    backend         Applications.Core/containers

State

❯ rad app connections myapp
Displaying application: myapp

Name: backend (Applications.Core/containers)
Connections: (none)
Resources:
  backend (apps/Deployment)
  backend (core/Service)
  backend (core/ServiceAccount)
  backend (rbac.authorization.k8s.io/Role)
  backend (rbac.authorization.k8s.io/RoleBinding)

❯ k get all -n demo5env-myapp
NAME                          READY   STATUS   RESTARTS      AGE
pod/backend-c565568bc-2fmv4   0/1     Error    3 (29s ago)   47s

NAME              TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/backend   ClusterIP   10.100.88.16   <none>        8080/TCP   46s

NAME                      READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/backend   0/1     1            0           47s

NAME                                DESIRED   CURRENT   READY   AGE
replicaset.apps/backend-c565568bc   1         1         0       47s

Delete

❯ rad app delete myapp
Application myapp deleted

❯ k get ns
NAME                  STATUS   AGE
...
demo5env-myapp        Active   102s
...

❯ k get all -n demo5env-myapp
No resources found in demo5env-myapp namespace.
radius-triage-bot[bot] commented 10 months ago

:+1: We've reviewed this issue and have agreed to add it to our backlog. Please subscribe to this issue for notifications, we'll provide updates when we pick it up.

We also welcome community contributions! If you would like to pick this item up sooner and submit a pull request, please visit our contribution guidelines and assign this to yourself by commenting "/assign" on this issue.

For more information on our triage process please visit our triage overview

vishwahiremat commented 10 months ago

@vishwahiremat, do you mean the "link" is not done as the "deployment of the resource" is failing when deployed separately ?

Yes and I have seen this issue when application and the container are deployed together as well. And I believe its not related to deploying the resources separately.

nithyatsu commented 10 months ago

I deployed a bicep with an invalid image for container. The deployment fails as expected.

nithya@Nithyas-MacBook-Pro ~ % rad deploy ~/Desktop/a.bicep --parameters magpieimage=ghcr.io/radius-project/magpiegoo:latest

Building /Users/nithya/Desktop/a.bicep...
Deploying template '/Users/nithya/Desktop/a.bicep' for application 'nithya' and environment 'default' from workspace 'default'...

Deployment In Progress... 

Completed            corerp-resources-gateway Applications.Core/applications
Completed            http-gtwy-back-rte Applications.Core/httpRoutes
...                  http-gtwy-front-ctnr Applications.Core/containers
...                  http-gtwy-back-ctnr Applications.Core/containers
Error: {
  "code": "DeploymentFailed",
  "message": "At least one resource deployment operation failed. Please see the details for the specific operation that failed.",
  "target": "/planes/radius/local/resourceGroups/default/providers/Microsoft.Resources/deployments/rad-deploy-534a0c1d-8638-41e1-94d1-5657c0bae2b8",
  "details": [
    {
      "code": "ResourceDeploymentFailure",
      "message": "Failed",
      "target": "/planes/radius/local/resourceGroups/default/providers/Applications.Core/containers/http-gtwy-front-ctnr",
      "details": [
        {
          "code": "Internal",
          "message": "Container state is 'Waiting' Reason: ErrImagePull, Message: rpc error: code = Unknown desc = failed to pull and unpack image \"ghcr.io/radius-project/magpiegoo:latest\": failed to resolve reference \"ghcr.io/radius-project/magpiegoo:latest\": failed to authorize: failed to fetch anonymous token: unexpected status: 403 Forbidden"
        }
      ]
    },
    {
      "code": "OK",
      "message": "",
      "target": "/planes/radius/local/resourceGroups/default/providers/Applications.Core/applications/corerp-resources-gateway"
    },
    {
      "code": "OK",
      "message": "",
      "target": "/planes/radius/local/resourceGroups/default/providers/Applications.Core/httpRoutes/http-gtwy-back-rte"
    },
    {
      "code": "ResourceDeploymentFailure",
      "message": "Failed",
      "target": "/planes/radius/local/resourceGroups/default/providers/Applications.Core/containers/http-gtwy-back-ctnr",
      "details": [
        {
          "code": "Internal",
          "message": "Container state is 'Waiting' Reason: ErrImagePull, Message: rpc error: code = Unknown desc = failed to pull and unpack image \"ghcr.io/radius-project/magpiegoo:latest\": failed to resolve reference \"ghcr.io/radius-project/magpiegoo:latest\": failed to authorize: failed to fetch anonymous token: unexpected status: 403 Forbidden"
        }
      ]
    }
  ]
}

TraceId:  3fdfe67f6f20198aff86743f7993239f
nithya@Nithyas-MacBook-Pro ~ % k get all -n default-corerp-resources-gateway
NAME                                        READY   STATUS             RESTARTS       AGE
pod/http-gtwy-back-ctnr-86b6cf5b8c-fmnp2    1/1     Running            1 (4d8h ago)   13d
pod/http-gtwy-back-ctnr-c498f6457-l69wr     0/1     ImagePullBackOff   0              3m51s
pod/http-gtwy-front-ctnr-67c8bfcb56-vrwmk   0/1     ImagePullBackOff   0              3m51s
pod/http-gtwy-front-ctnr-d84df4977-srzm5    1/1     Running            1 (4d8h ago)   13d
nithya@Nithyas-MacBook-Pro ~ % rad app delete corerp-resources-gateway
Application corerp-resources-gateway deleted
nithya@Nithyas-MacBook-Pro ~ % k get all -n default-corerp-resources-gateway
No resources found in default-corerp-resources-gateway namespace.

I cannot reproduce the issue, when resources are deployed together. My cli version is

nithya@Nithyas-MacBook-Pro ~ % rad version
RELEASE   VERSION   BICEP     COMMIT
0.29.0    v0.29.0   0.29.0    6abd7bfc3de0e748a2c34b721d95097afb6a2bba

I will try deploying resources separately and get back with an update.

nithyatsu commented 10 months ago

I think this is related to deployment failure, than the app definition spawning multiple files.

I am sharing the logs of interest between the case where I deploy an app with a container pointing to invalid image versus an app with valid container definition.

Logs from Application RP while rad app delete on an application whose deployment failed (due to invalid image in container spec):

2024-01-25T15:22:12.048-0800    INFO    radius.radiusasyncworker    worker/worker.go:252    Start processing operation. {"serviceName": "radius", "version": "edge", "hostName": "Nithyas-MacBook-Pro.local", "resourceId": "/planes/radius/local/resourcegroups/randomgroup/providers/Applications.Core/containers/allinonecontainer", "operationId": "f4d4e41b-790b-47c0-972e-0721939b1582", "operationType": "APPLICATIONS.CORE/CONTAINERS|DELETE", "dequeueCount": 1, "traceId": "1e9e7db9db2c0263a8690f7d8be5eb76", "spanId": "742538235db79936"}
2024-01-25T15:22:12.065-0800    INFO    radius.radiusasyncworker    worker/worker.go:260    Operation returned  {"serviceName": "radius", "version": "edge", "hostName": "Nithyas-MacBook-Pro.local", "resourceId": "/planes/radius/local/resourcegroups/randomgroup/providers/Applications.Core/containers/allinonecontainer", "operationId": "f4d4e41b-790b-47c0-972e-0721939b1582", "operationType": "APPLICATIONS.CORE/CONTAINERS|DELETE", "dequeueCount": 1, "traceId": "1e9e7db9db2c0263a8690f7d8be5eb76", "spanId": "742538235db79936", "success": true, "code": "", "provisioningState": "Succeeded", "err": null}
2024-01-25T15:22:12.070-0800    INFO    radius.radiusasyncworker    worker/worker.go:360    failed to update the provisioningState in resource because it no longer exists. {"serviceName": "radius", "version": "edge", "hostName": "Nithyas-MacBook-Pro.local", "resourceId": "/planes/radius/local/resourcegroups/randomgroup/providers/Applications.Core/containers/allinonecontainer", "operationId": "f4d4e41b-790b-47c0-972e-0721939b1582", "operationType": "APPLICATIONS.CORE/CONTAINERS|DELETE", "dequeueCount": 1, "traceId": "1e9e7db9db2c0263a8690f7d8be5eb76", "spanId": "742538235db79936"}

Logs from Application RP while rad app delete on an application whose deployment was successful :

2024-01-25T15:30:08.301-0800    INFO    radius.radiusasyncworker    worker/worker.go:252    Start processing operation. {"serviceName": "radius", "version": "edge", "hostName": "Nithyas-MacBook-Pro.local", "resourceId": "/planes/radius/local/resourcegroups/randomgroup/providers/Applications.Core/containers/allinonecontainer", "operationId": "bfe96eaf-be67-4e90-b42b-6ed303658a98", "operationType": "APPLICATIONS.CORE/CONTAINERS|DELETE", "dequeueCount": 1, "traceId": "929cdac227dad85290c14b398028ada2", "spanId": "926b421d1b3df760"}
2024-01-25T15:30:08.306-0800    INFO    radius.radiusasyncworker    deployment/deploymentprocessor.go:341   Deleting output resource: LocalID: Service, resource type: "Provider: kubernetes, Type: core/Service"
    {"serviceName": "radius", "version": "edge", "hostName": "Nithyas-MacBook-Pro.local", "resourceId": "/planes/radius/local/resourcegroups/randomgroup/providers/Applications.Core/containers/allinonecontainer", "operationId": "bfe96eaf-be67-4e90-b42b-6ed303658a98", "operationType": "APPLICATIONS.CORE/CONTAINERS|DELETE", "dequeueCount": 1, "traceId": "929cdac227dad85290c14b398028ada2", "spanId": "926b421d1b3df760"}
2024-01-25T15:30:08.341-0800    INFO    radius.radiusasyncworker    deployment/deploymentprocessor.go:341   Deleting output resource: LocalID: Deployment, resource type: "Provider: kubernetes, Type: apps/Deployment"
    {"serviceName": "radius", "version": "edge", "hostName": "Nithyas-MacBook-Pro.local", "resourceId": "/planes/radius/local/resourcegroups/randomgroup/providers/Applications.Core/containers/allinonecontainer", "operationId": "bfe96eaf-be67-4e90-b42b-6ed303658a98", "operationType": "APPLICATIONS.CORE/CONTAINERS|DELETE", "dequeueCount": 1, "traceId": "929cdac227dad85290c14b398028ada2", "spanId": "926b421d1b3df760"}
2024-01-25T15:30:08.357-0800    INFO    radius.radiusapi    rest/results.go:75  responding with status code: 200    {"serviceName": "radius", "version": "edge", "hostName": "Nithyas-MacBook-Pro.local", "resourceId": "/planes/radius/local/resourcegroups/randomgroup/providers/applications.core/containers/allinonecontainer", "traceId": "929cdac227dad85290c14b398028ada2", "spanId": "e1011fc68f9c554a", "statusCode": 200}
2024-01-25T15:30:08.376-0800    INFO    radius.radiusasyncworker    deployment/deploymentprocessor.go:341   Deleting output resource: LocalID: KubernetesRoleBinding, resource type: "Provider: kubernetes, Type: rbac.authorization.k8s.io/RoleBinding"
    {"serviceName": "radius", "version": "edge", "hostName": "Nithyas-MacBook-Pro.local", "resourceId": "/planes/radius/local/resourcegroups/randomgroup/providers/Applications.Core/containers/allinonecontainer", "operationId": "bfe96eaf-be67-4e90-b42b-6ed303658a98", "operationType": "APPLICATIONS.CORE/CONTAINERS|DELETE", "dequeueCount": 1, "traceId": "929cdac227dad85290c14b398028ada2", "spanId": "926b421d1b3df760"}
2024-01-25T15:30:08.409-0800    INFO    radius.radiusasyncworker    deployment/deploymentprocessor.go:341   Deleting output resource: LocalID: KubernetesRole, resource type: "Provider: kubernetes, Type: rbac.authorization.k8s.io/Role"
    {"serviceName": "radius", "version": "edge", "hostName": "Nithyas-MacBook-Pro.local", "resourceId": "/planes/radius/local/resourcegroups/randomgroup/providers/Applications.Core/containers/allinonecontainer", "operationId": "bfe96eaf-be67-4e90-b42b-6ed303658a98", "operationType": "APPLICATIONS.CORE/CONTAINERS|DELETE", "dequeueCount": 1, "traceId": "929cdac227dad85290c14b398028ada2", "spanId": "926b421d1b3df760"}
2024-01-25T15:30:08.448-0800    INFO    radius.radiusasyncworker    deployment/deploymentprocessor.go:341   Deleting output resource: LocalID: ServiceAccount, resource type: "Provider: kubernetes, Type: core/ServiceAccount"
    {"serviceName": "radius", "version": "edge", "hostName": "Nithyas-MacBook-Pro.local", "resourceId": "/planes/radius/local/resourcegroups/randomgroup/providers/Applications.Core/containers/allinonecontainer", "operationId": "bfe96eaf-be67-4e90-b42b-6ed303658a98", "operationType": "APPLICATIONS.CORE/CONTAINERS|DELETE", "dequeueCount": 1, "traceId": "929cdac227dad85290c14b398028ada2", "spanId": "926b421d1b3df760"}
2024-01-25T15:30:08.507-0800    INFO    radius.radiusasyncworker    worker/worker.go:260    Operation returned  {"serviceName": "radius", "version": "edge", "hostName": "Nithyas-MacBook-Pro.local", "resourceId": "/planes/radius/local/resourcegroups/randomgroup/providers/Applications.Core/containers/allinonecontainer", "operationId": "bfe96eaf-be67-4e90-b42b-6ed303658a98", "operationType": "APPLICATIONS.CORE/CONTAINERS|DELETE", "dequeueCount": 1, "traceId": "929cdac227dad85290c14b398028ada2", "spanId": "926b421d1b3df760", "success": true, "code": "", "provisioningState": "Succeeded", "err": null}
2024-01-25T15:30:08.512-0800    INFO    radius.radiusasyncworker    worker/worker.go:360    failed to update the provisioningState in resource because it no longer exists. {"serviceName": "radius", "version": "edge", "hostName": "Nithyas-MacBook-Pro.local", "resourceId": "/planes/radius/local/resourcegroups/randomgroup/providers/Applications.Core/containers/allinonecontainer", "operationId": "bfe96eaf-be67-4e90-b42b-6ed303658a98", "operationType": "APPLICATIONS.CORE/CONTAINERS|DELETE", "dequeueCount": 1, "traceId": "929cdac227dad85290c14b398028ada2", "spanId": "926b421d1b3df760"}

the bunch of outputResources (middle block) in successful case is missing from the case where deployment has failed.

I think we are populating radius outputResources only in case where the deployment has been successful (https://github.com/project-radius/radius/blob/68c47eaac742be051a59ead00683bb48426d345f/pkg/corerp/backend/deployment/deploymentprocessor.go) and are hence unable to clean up outputResources in case where deployment has failed. I will look further into the code.

Deployment failed:

nithya@Nithyas-MacBook-Pro bug7052 % rad resource list containers -a allinoneapp -o json
[
  {
    "id": "/planes/radius/local/resourcegroups/randomgroup/providers/Applications.Core/containers/allinonecontainer",
    "location": "global",
    "name": "allinonecontainer",
    "properties": {
      "application": "/planes/radius/local/resourcegroups/randomgroup/providers/Applications.Core/applications/allinoneapp",
      "connections": {},
      "container": {
        "args": [],
        "command": [],
        "env": {},
        "image": "ghcr.io/radius-project/magpiegoo:latest",
        "ports": {
          "web": {
            "containerPort": 3000,
            "protocol": "TCP",
            "provides": ""
          }
        },
        "workingDir": ""
      },
      "provisioningState": "Failed",
      "resourceProvisioning": "internal",
      "resources": [],
      "status": {}
    },
    "systemData": {
      "createdAt": "0001-01-01T00:00:00Z",
      "createdBy": "",
      "createdByType": "",
      "lastModifiedAt": "0001-01-01T00:00:00Z",
      "lastModifiedBy": "",
      "lastModifiedByType": ""
    },
    "tags": {},
    "type": "Applications.Core/containers"
  }
]

Deployment succeeded 

nithya@Nithyas-MacBook-Pro bug7052 % rad resource list containers -a allinoneapp -o json                
[
  {
    "id": "/planes/radius/local/resourcegroups/randomgroup/providers/Applications.Core/containers/allinonecontainer",
    "location": "global",
    "name": "allinonecontainer",
    "properties": {
      "application": "/planes/radius/local/resourcegroups/randomgroup/providers/Applications.Core/applications/allinoneapp",
      "connections": {},
      "container": {
        "args": [],
        "command": [],
        "env": {},
        "image": "ghcr.io/radius-project/magpiego:latest",
        "ports": {
          "web": {
            "containerPort": 3000,
            "port": 3000,
            "protocol": "TCP",
            "provides": ""
          }
        },
        "workingDir": ""
      },
      "provisioningState": "Succeeded",
      "resourceProvisioning": "internal",
      "resources": [],
      "status": {
        "outputResources": [
          {
            "id": "/planes/kubernetes/local/namespaces/allinoneenv-allinoneapp/providers/core/ServiceAccount/allinonecontainer",
            "localId": "ServiceAccount"
          },
          {
            "id": "/planes/kubernetes/local/namespaces/allinoneenv-allinoneapp/providers/rbac.authorization.k8s.io/Role/allinonecontainer",
            "localId": "KubernetesRole"
          },
          {
            "id": "/planes/kubernetes/local/namespaces/allinoneenv-allinoneapp/providers/rbac.authorization.k8s.io/RoleBinding/allinonecontainer",
            "localId": "KubernetesRoleBinding"
          },
          {
            "id": "/planes/kubernetes/local/namespaces/allinoneenv-allinoneapp/providers/apps/Deployment/allinonecontainer",
            "localId": "Deployment"
          },
          {
            "id": "/planes/kubernetes/local/namespaces/allinoneenv-allinoneapp/providers/core/Service/allinonecontainer",
            "localId": "Service"
          }
        ]
      }
    },
    "systemData": {
      "createdAt": "0001-01-01T00:00:00Z",
      "createdBy": "",
      "createdByType": "",
      "lastModifiedAt": "0001-01-01T00:00:00Z",
      "lastModifiedBy": "",
      "lastModifiedByType": ""
    },
    "tags": {},
    "type": "Applications.Core/containers"
  }
]