Closed rodrigc closed 10 months ago
Tried again and got this:
β·
β Error: Work Request error
β Provider version: 5.1.0, released on 2023-06-13. This provider is 1 Update(s) behind to current.
β Service: Containerengine Node Pool
β Error Message: work request did not succeed, workId: ocid1.clustersworkrequest.oc1.iad.aaaaaaaao5hm33jjm7bxdvvrzogqkzveg5hvmjswoadb6cfdqwlafc36c4xa, entity: nodepool, action: CREATED. Message: 1 nodes(s) register timeout. First, confirm that network prerequisites have been met. If network prerequisites have been met, troubleshoot the problem by running the Node Doctor script on the node(s) experiencing the issue, using either SSH or the Run Command feature. If you cannot resolve the issue using the troubleshooting output from the Node Doctor script, open a Service Request with My Oracle Support and upload the support bundle (a .tar file) to the support ticket. For more information, see https://docs.oracle.com/en-us/iaas/Content/ContEng/Concepts/contengnetworkconfig.htm and https://docs.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengtroubleshooting_topic-node_troubleshooting.htm
β Resource OCID: ocid1.nodepool.oc1.iad.aaaaaaaa44vawsysnzgwkgzhmyksnpx4v6izetwxxe4ql62pqnibqw7ruawa
β Suggestion: Please retry or contact support for help with service: Containerengine Node Pool
β
β
β with module.oke.module.workers[0].oci_containerengine_node_pool.workers["elastic1"],
β on .terraform/modules/oke/modules/workers/nodepools.tf line 5, in resource "oci_containerengine_node_pool" "workers":
β 5: resource "oci_containerengine_node_pool" "workers" {
β
β΅
I tried to look at the clustersworkrequest
but but got this:
oci disaster-recovery work-request-error list --work-request-id ocid1.clustersworkrequest.oc1.iad.aaaaaaaao5hm33jjm7bxdvvrzogqkzveg5hvmjswoadb6cfdqwlafc36c4xa --all
ServiceError:
{
"client_version": "Oracle-PythonSDK/2.104.3, Oracle-PythonCLI/3.29.1",
"code": "NotAuthorizedOrNotFound",
"logging_tips": "Please run the OCI CLI command using --debug flag to find more debug information.",
"message": "Authorization failed or requested resource not found.",
"opc-request-id": "EFA36A62792D43C19A00EF23E6D2E146/CCC3FB030DF35AEB4116BD560A3D82E7/A835F36F6A12786B3F452F8AE7F26BD7",
"operation_name": "list_work_request_errors",
"request_endpoint": "GET https://disaster-recovery.us-ashburn-1.oci.oraclecloud.com/20220125/workRequests/ocid1.clustersworkrequest.oc1.iad.aaaaaaaao5hm33jjm7bxdvvrzogqkzveg5hvmjswoadb6cfdqwlafc36c4xa/errors",
"status": 404,
"target_service": "disaster_recovery",
"timestamp": "2023-06-22T04:54:39.522041+00:00",
"troubleshooting_tips": "See [https://docs.oracle.com/iaas/Content/API/References/apierrors.htm] for more information about resolving this error. If you are unable to resolve this issue, run this CLI command with --debug option and contact Oracle support and provide them the full error message."
}
It looks like the cluster for the referenced nodepool was deleted. Taking a look at your latest output now.
Here's the oci CLI call for OKE work requests:
oci ce work-request-error --compartment-id ... --work-request-id ...
"Node ocid1.instance.oc1.iad.... register timeout",
This is typically related to NSG configuration that should be allowing worker <-> control plane communication. The default should be permitting this - are you able to see if there's a worker NSG on the created instances?
OK, I ran this:
oci ce work-request-error list --work-request-id ocid1.clustersworkrequest.oc1.iad.aaaaaaaao5hm33jjm7bxdvvrzogqkzveg5hvmjswoadb6cfdqwlafc36c4xa --compartment-id ocid1.compartment.oc1..aaaaaaaat5p4apgxiol5piajviglcfgozlvpbe4d6v2prlbme66zrv7k7gtq
and got this:
oci ce work-request-error list --work-request-id ocid1.clustersworkrequest.oc1.iad.aaaaaaaao5hm33jjm7bxdvvrzogqkzveg5hvmjswoadb6cfdqwlafc36c4xa --compartment-id ocid1.compartment.oc1..aaaaaaaat5p4apgxiol5piajviglcfgozlvpbe4d6v2prlbme66zrv7k7gtq
{
"data": [
{
"code": "GetWorkRequestGeneric",
"message": "1 nodes(s) register timeout. First, confirm that network prerequisites have been met. If network prerequisites have been met, troubleshoot the problem by running the Node Doctor script on the node(s) experiencing the issue, using either SSH or the Run Command feature. If you cannot resolve the issue using the troubleshooting output from the Node Doctor script, open a Service Request with My Oracle Support and upload the support bundle (a .tar file) to the support ticket. For more information, see https://docs.oracle.com/en-us/iaas/Content/ContEng/Concepts/contengnetworkconfig.htm and https://docs.oracle.com/en-us/iaas/Content/ContEng/Tasks/contengtroubleshooting_topic-node_troubleshooting.htm",
"timestamp": "2023-06-22T04:30:06+00:00"
}
]
}
I'm not sure how to diagnose that...
This is typically related to NSG configuration that should be allowing worker <-> control plane communication. The default should be permitting this - are you able to see if there's a worker NSG on the created instances?
How do I list the NSG configuration?
The cluster id is:
ocid1.cluster.oc1.iad.aaaaaaaawhgcv3pt6iae5quyapym2ym4wtfelcrf75nk2soufcczs3p5pvma
@devoncrouse any idea about this? Is it better for me to file a ticket at https://cloud.oracle.com/support vs. having this issue open in GitHub?
Hi @rodrigc, you can try:
$ oci network nsg list -c ocid1.compartment...
Or it may be easier to look at the instance in the console UI e.g. [https://cloud.oracle.com/compute/instances]():
oci network nsg list --compartment-id XXX
shows me nothing
Error not reproducible with latest 5.x branch of this module. Most likely fixed by https://github.com/oracle-terraform-modules/terraform-oci-oke/pull/764
Community Note
Summary
When I use the following terraform against the 5.x to create a cluster, I got an error when creating the node pool.
Any idea how to resolve this?
Terraform Version and Provider Version
Terraform Configuration Files
Debug Output
Steps to Reproduce
terraform apply