Closed hamzy closed 1 month ago
@hamzy thanks for reporting an issue, can you please dump more information like complete dump of the IBMPowerVSCluster
resource.
@Karthik-K-N are we setting right state for the cluster when error happens? This needs discussion how to fail fast when things go wrong! at least we need have some condition or design how many times do we really want to retry if something gets failed to create
[inner hamzy@li-3d08e84c-2e1c-11b2-a85c-e2db7bb078fc hamzy-installer]$ oc get ibmpowervscluster -n openshift-cluster-api-guests -o yaml
apiVersion: v1
items:
- apiVersion: infrastructure.cluster.x-k8s.io/v1beta2
kind: IBMPowerVSCluster
metadata:
annotations:
powervs.cluster.x-k8s.io/create-infra: "true"
creationTimestamp: "2024-03-08T12:24:43Z"
finalizers:
- ibmpowervscluster.infrastructure.cluster.x-k8s.io
generation: 1
labels:
cluster.x-k8s.io/cluster-name: rdr-hamzy-test-dal10-58hkl
name: rdr-hamzy-test-dal10-58hkl
namespace: openshift-cluster-api-guests
ownerReferences:
- apiVersion: cluster.x-k8s.io/v1beta1
blockOwnerDeletion: true
controller: true
kind: Cluster
name: rdr-hamzy-test-dal10-58hkl
uid: fd6c490c-c444-48e9-93b9-f573c82b1fb4
resourceVersion: "436"
uid: d72f51e2-3b1d-4db6-b89f-17d90525c623
spec:
controlPlaneEndpoint:
host: ""
port: 0
cosInstance:
bucketName: rhcos-powervs-images-us-south
bucketRegion: us-south
name: rdr-hamzy-test-dal10-58hkl-cos
network:
name: rdr-hamzy-test-dal10-58hkl-network
resourceGroup:
name: powervs-ipi-resource-group
serviceInstance:
id: 701beea6-d79d-4e8a-8e8a-8d122f3754b6
serviceInstanceID: ""
transitGateway:
name: rdr-hamzy-test-dal10-58hkl-tg
vpc:
name: rdr-hamzy-test-dal10-58hkl-vpc
region: us-south
zone: dal10
status:
conditions:
- lastTransitionTime: "2024-03-08T12:36:39Z"
status: "True"
type: NetworkReady
- lastTransitionTime: "2024-03-08T12:24:45Z"
status: "True"
type: ServiceInstanceReady
- lastTransitionTime: "2024-03-08T12:25:17Z"
message: 'error creating transit gateway: cannot add more than 5 gateways to
the selected region'
reason: TransitGatewayReconciliationFailed
severity: Error
status: "False"
type: TransitGatewayReady
- lastTransitionTime: "2024-03-08T12:25:07Z"
status: "True"
type: VPCReady
- lastTransitionTime: "2024-03-08T12:25:12Z"
status: "True"
type: VPCSubnetReady
dhcpServer:
controllerCreated: true
id: 48a13744-959e-4c58-b3a1-0e3f5941a475
network:
controllerCreated: true
id: 44e09ab9-b84c-4d70-8ac6-da0612f7e8d0
ready: false
resourceGroupID:
controllerCreated: false
id: c1cb9b2679344ee9951ab8b4bc22eca0
vpc:
controllerCreated: true
id: r006-c5c1eb58-6685-48d3-a324-1885eafbcae9
vpcSubnet:
rdr-hamzy-test-dal10-58hkl-vpcsubnet-us-south-1:
controllerCreated: true
id: 0717-f8b6ae0b-d076-44c7-aa59-c60e20a7358b
rdr-hamzy-test-dal10-58hkl-vpcsubnet-us-south-2:
controllerCreated: true
id: 0727-128430a8-69a6-4032-b95d-94ebf4603630
rdr-hamzy-test-dal10-58hkl-vpcsubnet-us-south-3:
controllerCreated: true
id: 0737-ed2ea4cf-0958-4c72-82ee-f4994fb7526c
kind: List
metadata:
resourceVersion: ""
@hamzy as we can see that condition in the status for the TransitGatewayReady
is already set as Error
which shows something is wrong with the infra and cluster never becomes active.
Considering the way controllers designed it always looks for making that resource available even after the failure in the next retry. Its user's concise decision when to terminate the cluster based on the conditions or go and fix the environment in the backend to proceed the installation flow(e.g: user talking to admin to bump the limit for the transit gateways in this case)
May be having a timeout in the installer with some level of error checking of these conditions will be a better way to deal with such situations.
as per above comment closing this issue
/kind bug /area provider/ibmcloud
What steps did you take and what happened: [A clear and concise description of what the bug is.]
During an IPI CAPI create cluster, a transit gateway is not created. The cluster is useless without this.
What did you expect to happen: Immediate failure.
Anything else you would like to add: [Miscellaneous information that will assist in solving the issue.]
Environment:
kubectl version
):/etc/os-release
):