terraform-ibm-modules / stack-retrieval-augmented-generation

A deployable architecture that automates the deployment of a sample gen AI Pattern on IBM Cloud, including all underlying IBM Cloud and WatsonX infrastructure.
Apache License 2.0
1 stars 11 forks source link

RAG DA does not uninstall sometimes #165

Open gmendel opened 3 months ago

gmendel commented 3 months ago

There are many reasons why an Uninstall will not work:

It does not matter what the reason is, the "bar"/expectation is that the DA can ALWAYS uninstall and clean up. This implies potentially to run pre-uninstall script/s and sync up the state.

ocofaigh commented 3 months ago

@gmendel FYI, There is a step here that says:

  1. Delete Resources Created by the CI toolchain Those resources are not destroyed automatically as part of undeploying the stack in Project:
  • Code Engine Project: Delete the code engine project created for the sample application.
  • Container Registry Namespace: Delete the container registry namespace created by the CI tookchain.

And actions are being taken on addressing those (such as using a standalone Code Engine DA so the code engine project is in the terraform state and can be destroyed).

But interesting point on SecretMgr free plan laps. We may need to find out how to handle that use case

vburckhardt commented 2 months ago

Example of error when undeploying the RAG DA if the secret manager instance has been destroyed before running the undeploy.

 2024/09/26 09:54:19 Terraform refresh | 
 2024/09/26 09:54:19 Terraform refresh | Error: GetSecretWithContext failed Get "https://5c724fcf-5e0c-47a1-b8a7-71d6301657f4.private.eu-de.secrets-manager.appdomain.cloud/api/v2/secrets/cddf891b-7f05-190b-728f-e8e401d153bd": dial tcp: lookup 5c724fcf-5e0c-47a1-b8a7-71d6301657f4.private.eu-de.secrets-manager.appdomain.cloud on 172.21.0.10:53: no such host
 2024/09/26 09:54:19 Terraform refresh | null
 2024/09/26 09:54:19 Terraform refresh | 
 2024/09/26 09:54:19 Terraform refresh | 
 2024/09/26 09:54:19 Terraform refresh |   with module.secrets_manager_secret_ibm_iam[0].ibm_sm_arbitrary_secret.arbitrary_secret[0],
 2024/09/26 09:54:19 Terraform refresh |   on .terraform/modules/secrets_manager_secret_ibm_iam/main.tf line 37, in resource "ibm_sm_arbitrary_secret" "arbitrary_secret":
 2024/09/26 09:54:19 Terraform refresh |   37: resource "ibm_sm_arbitrary_secret" "arbitrary_secret" {
 2024/09/26 09:54:19 Terraform refresh | 
 2024/09/26 09:54:19 Terraform refresh | 
 2024/09/26 09:54:19 Terraform refresh | Error: GetSecretWithContext failed Get "https://5c724fcf-5e0c-47a1-b8a7-71d6301657f4.private.eu-de.secrets-manager.appdomain.cloud/api/v2/secrets/4a630cd7-ccfc-75e6-a0ed-1f7eb652e28e": dial tcp: lookup 5c724fcf-5e0c-47a1-b8a7-71d6301657f4.private.eu-de.secrets-manager.appdomain.cloud on 172.21.0.10:53: no such host
 2024/09/26 09:54:19 Terraform refresh | null
 2024/09/26 09:54:19 Terraform refresh | 
 2024/09/26 09:54:19 Terraform refresh | 
 2024/09/26 09:54:19 Terraform refresh |   with module.secrets_manager_secret_signing_key[0].ibm_sm_arbitrary_secret.arbitrary_secret[0],
 2024/09/26 09:54:19 Terraform refresh |   on .terraform/modules/secrets_manager_secret_signing_key/main.tf line 37, in resource "ibm_sm_arbitrary_secret" "arbitrary_secret":
 2024/09/26 09:54:19 Terraform refresh |   37: resource "ibm_sm_arbitrary_secret" "arbitrary_secret" {
 2024/09/26 09:54:19 Terraform refresh | 
 2024/09/26 09:54:19 Terraform REFRESH error: Terraform REFRESH errorexit status 1
 2024/09/26 09:54:19 Could not execute job: Error : Terraform REFRESH errorexit status 1
ocofaigh commented 2 months ago

@vburckhardt what can we do to solve this? I guess the instance would remain in reclamation for a period of time, and the user would have to request for it to be recovered, and at the same time purchase a standard plan?

hmagph commented 1 month ago

Bit of similar error for me, when trying to redeploy the SM DA 1.18.7 after its SM instance got discarded (accidentally). In my case, it was coming from DevSecOps ALM stack. https://cloud.ibm.com/projects/71ebf194-18f9-45c2-bc22-9d7c62e4ef54/configurations/49710c65-3a60-4ddf-8a6c-7b226ddec42f/edit

 2024/10/09 15:28:21 Terraform plan | Changes to Outputs:
 2024/10/09 15:28:21 Terraform plan |   ~ secrets_manager_crn    = "crn:v1:bluemix:public:secrets-manager:eu-de:a/ab0571c606236c08ccd5471e264911a2:d1e1f9b9-b359-4bf4-8964-92eac0cca836::" -> (known after apply)
 2024/10/09 15:28:21 Terraform plan |   ~ secrets_manager_guid   = "d1e1f9b9-b359-4bf4-8964-92eac0cca836" -> (known after apply)
 2024/10/09 15:28:21 Terraform plan |   ~ secrets_manager_id     = "crn:v1:bluemix:public:secrets-manager:eu-de:a/ab0571c606236c08ccd5471e264911a2:d1e1f9b9-b359-4bf4-8964-92eac0cca836::" -> (known after apply)
 2024/10/09 15:28:21 Terraform plan | 
 2024/10/09 15:28:21 Terraform plan | Warning: Argument is deprecated
 2024/10/09 15:28:21 Terraform plan | 
 2024/10/09 15:28:21 Terraform plan |   with module.kms[0].module.kms_key_rings["devsecops-sm-cos-key-ring"].ibm_kms_key_rings.key_ring,
 2024/10/09 15:28:21 Terraform plan |   on .terraform/modules/kms.kms_key_rings/main.tf line 9, in resource "ibm_kms_key_rings" "key_ring":
 2024/10/09 15:28:21 Terraform plan |    9:   force_delete  = var.force_delete
 2024/10/09 15:28:21 Terraform plan | 
 2024/10/09 15:28:21 Terraform plan | force_delete is now deprecated. Please remove all references to this field.
 2024/10/09 15:28:21 Terraform plan | 
 2024/10/09 15:28:21 Terraform plan | Error: GetNotificationsRegistrationWithContext failed Get "https://d1e1f9b9-b359-4bf4-8964-92eac0cca836.private.eu-de.secrets-manager.appdomain.cloud/api/v2/notifications/registration": dial tcp: lookup d1e1f9b9-b359-4bf4-8964-92eac0cca836.private.eu-de.secrets-manager.appdomain.cloud on 172.21.0.10:53: no such host
 2024/10/09 15:28:21 Terraform plan | null
 2024/10/09 15:28:21 Terraform plan | 
 2024/10/09 15:28:21 Terraform plan | 
 2024/10/09 15:28:21 Terraform plan |   with module.secrets_manager.ibm_sm_en_registration.sm_en_registration[0],
 2024/10/09 15:28:21 Terraform plan |   on ../../main.tf line 139, in resource "ibm_sm_en_registration" "sm_en_registration":
 2024/10/09 15:28:21 Terraform plan |  139: resource "ibm_sm_en_registration" "sm_en_registration" {
 2024/10/09 15:28:21 Terraform plan | 
hiltol commented 3 weeks ago

I am also encountering errors when trying to undeploy the stack.

Screenshot 2024-10-29 at 4 13 54 PM Screenshot 2024-10-29 at 4 18 51 PM

Workspace Logs: workspace-logs.txt

ocofaigh commented 2 weeks ago

@hiltol Your errors seem to be related to permissions: Error: DeleteTektonPipelinePropertyWithContext failed Forbidden (cc @padraic-edwards @huayuenh)

huayuenh commented 2 weeks ago

@ocofaigh a continuous delivery service is a hard requirement for the ALM. Must have been in place when ALM was stood up but deleted before attempting to remove the ALM. Was the CD service stood using the ALM?