Closed mehdibenfeguir closed 7 months ago
@mehdibenfeguir The error shows below ```Error from server (Forbidden): error when creating "STDIN": admission webhook "validation.machinepool.cluster.x-k8s.io" denied the request: spec: Forbidden: can be set only if the MachinePool feature flag is enabled
Please enable machinepool flag before doing clusterctl init --infrastructure oci command
Please see doc https://oracle.github.io/cluster-api-provider-oci/managed/managedcluster.html#environment-variables
You will have to do clusterctl delete --all and then reinitalize after exporting the variable.
ok I did that, the MachinePool error is gone but I'm still getting these errors
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "default.ocimanagedcluster.infrastructure.cluster.x-k8s.io": failed to call webhook: Post "https://capoci-webhook-service.cluster-api-provider-oci-system.svc:443/mutate-infrastructure-cluster-x-k8s-io-v1beta2-ocimanagedcluster?timeout=10s": EOF
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "default.ocimanagedcontrolplane.infrastructure.cluster.x-k8s.io": failed to call webhook: Post "https://capoci-webhook-service.cluster-api-provider-oci-system.svc:443/mutate-infrastructure-cluster-x-k8s-io-v1beta2-ocimanagedcontrolplane?timeout=10s": EOF
Error from server (InternalError): error when creating "STDIN": Internal error occurred: failed calling webhook "default.ocimanagedmachinepool.infrastructure.cluster.x-k8s.io": failed to call webhook: Post "https://capoci-webhook-service.cluster-api-provider-oci-system.svc:443/mutate-infrastructure-cluster-x-k8s-io-v1beta2-ocimanagedmachinepool?timeout=10s": EOF
Are the CAPOCI pods running properly? Can you check the output of kubectl get pods -n cluster-api-provider-oci-system and see if the capoci pods are running fine? If not can you check the logs using kubectl logs command to see why they are not running fine?
k logs capoci-controller-manager-5648c768-mpjwb -n cluster-api-provider-oci-system
I0118 10:36:42.416442 1 main.go:240] "setup: CAPOCI Version" version="v0.14.0"
E0118 10:36:42.416470 1 main.go:249] "setup: unable to get OCI region from AuthConfigProvider" err="region can not be empty or have spaces"
how should I provide the region to the management cluster
I'm doing export OCI_REGION=me-jeddah-1
Please follow the instructions here https://oracle.github.io/cluster-api-provider-oci/gs/install-cluster-api.html#install-cluster-api-provider-for-oracle-cloud-infrastructure to give the details, if you are using OKE as management cluster, "Instance Principals" are recommended for production, although user principal may be easier to start with.
yes I'm using the exact same config and I'm using export OCI_REGION=me-jeddah-1
and it's still complaining about the region
I0118 10:36:42.416442 1 main.go:240] "setup: CAPOCI Version" version="v0.14.0"
E0118 10:36:42.416470 1 main.go:249] "setup: unable to get OCI region from AuthConfigProvider" err="region can not be empty or have spaces"
did you execute this step as well? export OCI_REGION_B64="$(echo -n "$OCI_REGION" | base64 | tr -d '\n')"
but it's conditional, so I didn't
Let me add it
ok now complaining about private key even if I provided it like that
export OCI_CREDENTIALS_KEY_B64=$(base64 < ~/.ssh/id_rsa | tr -d '\n')
I tried also to echo $OCI_CREDENTIALS_KEY_B64
and it's showing the encrypted content
k logs capoci-controller-manager-5648c768-kbm4m -n cluster-api-provider-oci-system
I0118 11:15:42.408639 1 main.go:240] "setup: CAPOCI Version" version="v0.14.0"
E0118 11:15:42.408775 1 clients.go:188] "msg"="unable to create OCI VCN Client" "error"="can not create client, bad configuration: failed to parse private key"
E0118 11:15:42.408798 1 main.go:261] "setup: authentication provider could not be initialised" err="can not create client, bad configuration: failed to parse private key"
The private key is not he ssh private key, it should be the oci private key, please go through the doc https://docs.oracle.com/en-us/iaas/Content/API/Concepts/apisigningkey.htm and create a private key in ~/.oci folder and provide that path
ok thanks all issues has been fixed and I was able to create the cluster with no errors
sigs.k8s.io/cluster-api
cluster.cluster.x-k8s.io/capi-mbf configured
ocimanagedcluster.infrastructure.cluster.x-k8s.io/capi-mbf created
ocimanagedcontrolplane.infrastructure.cluster.x-k8s.io/capi-mbf created
machinepool.cluster.x-k8s.io/capi-mbf-mp-0 configured
ocimanagedmachinepool.infrastructure.cluster.x-k8s.io/capi-mbf-mp-0 created
but the cluster is not created
k describe cluster capi-mbf
Name: capi-mbf
Namespace: default
Labels: cluster.x-k8s.io/cluster-name=capi-mbf
Annotations: <none>
API Version: cluster.x-k8s.io/v1beta1
Kind: Cluster
Metadata:
Creation Timestamp: 2024-01-18T09:12:28Z
Finalizers:
cluster.cluster.x-k8s.io
Generation: 8
Resource Version: 328221976
UID: b59c366f-1aa9-4aa7-8324-652880918aec
Spec:
Control Plane Endpoint:
Host:
Port: 0
Control Plane Ref:
API Version: infrastructure.cluster.x-k8s.io/v1beta1
Kind: OCIManagedControlPlane
Name: capi-mbf
Namespace: default
Infrastructure Ref:
API Version: infrastructure.cluster.x-k8s.io/v1beta1
Kind: OCIManagedCluster
Name: capi-mbf
Namespace: default
Status:
Conditions:
Last Transition Time: 2024-01-18T11:36:49Z
Reason: WaitingForControlPlane
Severity: Info
Status: False
Type: Ready
Last Transition Time: 2024-01-18T11:36:49Z
Message: Waiting for control plane provider to indicate the control plane has been initialized
Reason: WaitingForControlPlaneProviderInitialized
Severity: Info
Status: False
Type: ControlPlaneInitialized
Last Transition Time: 2024-01-18T11:36:49Z
Reason: WaitingForControlPlane
Severity: Info
Status: False
Type: ControlPlaneReady
Last Transition Time: 2024-01-18T11:36:49Z
Reason: WaitingForInfrastructure
Severity: Info
Status: False
Type: InfrastructureReady
Observed Generation: 8
Phase: Provisioning
Events: <none>
The control plane has not been created, you can either describe the ocimanagedcontrolplane object, or look at the capoci logs(preferable)
the logs shows this, but I provided the pem key and I'm authenticated
when I run this command I get the list
oci iam region list --config-file /Users/mehdibenfeguir/.oci/config --profile mehdi --auth security_token
E0118 11:44:08.554708 1 controller.go:329] "Reconciler error" err=<
Error returned by ContainerEngine Service. Http Status Code: 401. Error Code: NotAuthenticated. Opc request id: 662ee8794ff90c3fd7213fb040eb6cc7/E18DCD24600BBA005B143903C85449B4/72D3DA271E8E93B70DFE2A58AA0BD9D8. Message: Failed to verify the HTTP(S) Signature
Definitely a problem with your pem key, did you create a pem key and upload it to OCI console as explained in the doc? in the command prompt, you are using security token, not private key.
ok so now it's a different error
E0118 12:07:03.792027 1 vcn_reconciler.go:101] "failed to list vcn by name" err=<
Error returned by VirtualNetwork Service. Http Status Code: 404. Error Code: NotAuthorizedOrNotFound. Opc request id: 7f6846f93cdf9778d3c1ac8bad1b7649/B7CE5A411106A9BA93DE717DB3FF8BAB/5B493160620374EB44402EE936D865DF. Message: Authorization failed or requested resource not found.
is it supposed to create a new VCN or check for an existing one ?
It will create a new VCN, have you added the necessary policies to the user? please add policies mentioned here https://oracle.github.io/cluster-api-provider-oci/gs/iam/iam-oke.html
you can also verify with oci cli if you are able to list VCN etc.
ok policies added and the cluster was created with 0 nodepools checking the logs I'm getting this
failed to create OCIManagedMachinePool: Error returned by ContainerEngine Service. Http Status Code: 400. Error Code: InvalidParameter. Opc request id: 338c92f544911fd20e94c9a39d6c5550/91FED560B24118FF7A883374A1A55790/1D5EF9F8F8A2E126712DA1DD568A3BF8. Message: Invalid sshPublicKey: Provided key is not a valid OpenSSH public key. Operation Name: CreateNodePool Timestamp: 2024-01-18 12:55:49 +0000 GMT Client Version: Oracle-GoSDK/65.45.0 Request Endpoint: POST https://containerengine.me-jeddah-1.oci.oraclecloud.com/20180222/nodePools Troubleshooting Tips: See https://docs.oracle.com/iaas/Content/API/References/apierrors.htm#apierrors_400__400_invalidparameter for more information about resolving this error. Also see https://docs.oracle.com/iaas/api/#/en/containerengine/20180222/NodePool/CreateNodePool for details on this operation's requirements. To get more info on the failing request, you can set OCI_GO_SDK_DEBUG env var to info or higher level to log the request/response details. If you are unable to resolve this ContainerEngine issue, please contact Oracle support and provide them this full error message.
I added the public key provided from the OCI console the passed it as an environment variable when creating the cluster am I doing anything wrong ?
OCI_COMPARTMENT_ID={oci_compartment_id} \
OCI_IMAGE_ID={ocid_image_id}\
OCI_SSH_KEY=/Users/mehdibenfeguir/downloads/public.pem \ #This file was downloaded from OCI console
CONTROL_PLANE_MACHINE_COUNT=1 \
KUBERNETES_VERSION=v1.27.2 \
NAMESPACE=default \
NODE_MACHINE_COUNT=1 \
clusterctl generate cluster capi-mbf \
--from /Users/mehdibenfeguir/downloads/cluster-template-managed.yaml | kubectl apply -f -
When you create a managed node pool, the key which has to be provided in the managed node pool params is the SSH key, not the OCI key. This SSH key will be used to SSH to the machines.
oh so I need to provide my personal public ssh key great let me try
it's working !! Thank you very much @shyamradhakrishnan for the precious help I suggest to enhance docs especially for the ssh keys it's a little bit confusing
Sorry but another last question
when I want to cleanup and run the command kubectl delete cluster {cluster_name}
the cluster gets deleted but not the vcn, is there any possible solution to automate the cleanup ?
the VCN should be deleted if you deleted the cluster using kubectl delete cluster, you can verify the logs, maybe it is an older VCN? Or maybe there was an error during delete of VCN?
these are the logs
failed to delete subnet: Error returned by VirtualNetwork Service. Http Status Code: 409. Error Code: Conflict. Opc request id: 843868a7e1f565e0b1de0d6e66339f8a/EE2B2536FECCA3EF5F12F5208F13456F/43E7DE93F48B26EFF2F5E0F91FAEECDD. Message: The Subnetxxxx references the VNIC xxx. You must remove the reference to proceed with this operation.
did you create an LB service, or anything in the cluster? Are all the compute instances deleted?
I just did this, and yes all the comupte instances were deleted fine
OCI_COMPARTMENT_ID=xxx \
OCI_IMAGE_ID=xxx \
OCI_SSH_KEY="$(cat /Users/mehdibenfeguir/.ssh/id_rsa.pub)" \
CONTROL_PLANE_MACHINE_COUNT=1 \
KUBERNETES_VERSION=v1.28.2 \
NAMESPACE=default \
NODE_MACHINE_COUNT=1 \
clusterctl generate cluster capi-mbf \
--from /Users/mehdibenfeguir/downloads/cluster-template-managed.yaml | kubectl apply -f -
hmm, the error clearly shows there is a VNIC resource attached to the subnet, because of which the subnet could not be deleted. So you did not create any pod, or any resource in the cluster? And you deleted the cluster using kubectl delete cluster command? What is the name of the subnet which could not be deleted?
So you did not create any pod, or any resource in the cluster?
yes
And you deleted the cluster using kubectl delete cluster command?
yes
What is the name of the subnet which could not be deleted?
the subnet that includes the name of capi-mbf
It would be great if you can provide the full name of the subnet, can you execute the command oci network vnic get, and see what the VNIC is attached to? Sorry, ideally if the delete reaches subnet, it should have deleted all resources, so unless you see any other errors in the logs, we will have to debug this more.
I already removed it manually so I can't check the exact name now
Thanks, Can we close this ticket and you can create a new one when you notice it again?
ok thanks for helping
What happened: trying to create a new managed cluster from an existing OKE clustger
What you expected to happen: the new managed cluster to be created
How to reproduce it (as minimally and precisely as possible): I did clusterctl init --infrastructure oci and CRDs were just created fine
running
the file
~/downloads/cluster-template-managed.yaml
is fetched from this link https://github.com/oracle/cluster-api-provider-oci/releases/download/v0.14.0/cluster-template-managed.yamlresults into these errors
Anything else we need to know?: Anyone could help me to know the exact issue ? Environment:
clusterctl version
): v1.6.0kubectl version
): Client Version: v1.28.0 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: v1.27.2docker info
): Client: Docker Engine - Community Version: 24.0.6/etc/os-release
):