terraform-ibm-modules / terraform-ibm-observability-da

A deployable architecture solution to deploy Observability instances and agents.
Apache License 2.0
0 stars 1 forks source link

Cloud logs instance provisioning times out #172

Open in-1911 opened 1 day ago

in-1911 commented 1 day ago

The Cloud Logs instance takes longer than 10 minutes to provision and the deployment fails due to a timeout

2024/09/30 15:59:20 Terraform apply | module.observability_instance.module.cloud_logs[0].ibm_resource_instance.cloud_logs: Still creating... [10m0s elapsed]
 2024/09/30 15:59:22 Terraform apply | 
 2024/09/30 15:59:22 Terraform apply | Warning: Argument is deprecated
 2024/09/30 15:59:22 Terraform apply | 
 2024/09/30 15:59:22 Terraform apply |   with module.kms[0].module.kms_key_rings["rag-rok-42-observability-cos-key-ring"].ibm_kms_key_rings.key_ring,
 2024/09/30 15:59:22 Terraform apply |   on .terraform/modules/kms.kms_key_rings/main.tf line 9, in resource "ibm_kms_key_rings" "key_ring":
 2024/09/30 15:59:22 Terraform apply |    9:   force_delete  = var.force_delete
 2024/09/30 15:59:22 Terraform apply | 
 2024/09/30 15:59:22 Terraform apply | force_delete is now deprecated. Please remove all references to this field.
 2024/09/30 15:59:22 Terraform apply | 
 2024/09/30 15:59:22 Terraform apply | (and one more similar warning elsewhere)
 2024/09/30 15:59:22 Terraform apply | 
 2024/09/30 15:59:22 Terraform apply | Error: [ERROR] Error waiting for create resource instance (crn:v1:bluemix:public:logs:us-south:a/2e9****1f:feb****63::) to be succeeded: timeout while waiting for state to become 'active' (last state: 'provisioning', timeout: 10m0s)
 2024/09/30 15:59:22 Terraform apply | 
 2024/09/30 15:59:22 Terraform apply |   with module.observability_instance.module.cloud_logs[0].ibm_resource_instance.cloud_logs,
 2024/09/30 15:59:22 Terraform apply |   on .terraform/modules/observability_instance/modules/cloud_logs/main.tf line 7, in resource "ibm_resource_instance" "cloud_logs":
 2024/09/30 15:59:22 Terraform apply |    7: resource "ibm_resource_instance" "cloud_logs" {
 2024/09/30 15:59:22 Terraform apply | 
 2024/09/30 15:59:22 Terraform apply | ---
 2024/09/30 15:59:22 Terraform apply | id: terraform-446cc00a
 2024/09/30 15:59:22 Terraform apply | summary: '[ERROR] Error waiting for create resource instance
 2024/09/30 15:59:22 Terraform apply | (crn:v1:bluemix:public:logs:us-south:a/2e9***1f:feb***63::)
 2024/09/30 15:59:22 Terraform apply |   to be succeeded: timeout while waiting for state to become ''active'' (last state:
 2024/09/30 15:59:22 Terraform apply |   ''provisioning'', timeout: 10m0s)'

Affected modules

*

Terraform CLI and Terraform provider versions

Terraform output

Debug output

Expected behavior

Actual behavior

Steps to reproduce (including links and screen captures)

  1. Run terraform apply

Anything else


By submitting this issue, you agree to follow our Code of Conduct

in-1911 commented 1 day ago

It seems that the instance got stuck in "Provisioning" status, so increasing a timeout may not be of much help. Working with Cloud support on figuring out how to fix the instance first.

ocofaigh commented 22 hours ago

@in-1911 I have seen this a few times, and reported to the Cloud logs team - I think they verified they have a provisioning bug

in-1911 commented 21 hours ago

May be there is a timing issue with s2s auth and a time_wait would actually help.

ocofaigh commented 6 hours ago

The service team have confirmed there is a fix coming for this issue. Meanwhile, I think we need to document how a user gets them out of this mess, because when the timeout occurs, you can no do a subsequent apply or destroy, as you get:

[ERROR] Error deleting resource instance: An operation 'create' is in progress, please try again once the operation is complete

The only way I managed to fix this was by running: ibmcloud schematics workspace state rm --id "${WORKSPACE_ID}" --address "module.observability_instance.module.cloud_logs[0].ibm_resource_instance.cloud_logs" and then either re-applying or destroying.

I'll get this into a doc

ocofaigh commented 6 hours ago

FYI, I created a PR to add it to troubleshooting section of the docs: https://github.com/terraform-ibm-modules/stack-retrieval-augmented-generation/pull/199