Closed harshjay04 closed 4 years ago
The installer doesn't create the worker directly, but the cluster operator machine-api-operator using the cluster-api creates the workers.
try to see the logs for the controller in openshift-machine-api and the machine objects.
also make sure the service principal is correctly setup https://github.com/openshift/installer/blob/master/docs/user/azure/credentials.md#step-2-request-permissions-for-the-service-principal-from-tenant-administrator
This is what we found as part of the troubleshooting, also we have the service principal setup as per the documentation, for our understanding if service principal is not set properly the master nodes itself won't get deployed correctly, I may be wrong, need help to get this through
./oc logs ingress-operator-58478cc77f-ckss5 -n openshift-ingress-operator 2019-09-09T18:37:05.575Z INFO operator log/log.go:26 started zapr logger 2019-09-09T18:37:07.497Z INFO operator.entrypoint ingress-operator/main.go:62 using operator namespace {"namespace": "openshift-ingress-operator"} 2019-09-09T18:37:07.514Z ERROR operator.entrypoint ingress-operator/main.go:105 failed to create DNS manager {"error": "failed to get cloud credentials from secret /: secrets \"cloud-credentials\" not found"}
This is what we found as part of the troubleshooting, also we have the service principal setup as per the documentation, for our understanding if service principal is not set properly the master nodes itself won't get deployed correctly,
The master nodes will be created even if you haven't done step 2 from https://github.com/openshift/installer/blob/master/docs/user/azure/credentials.md#step-2-request-permissions-for-the-service-principal-from-tenant-administrator
If you read that section, it is required so that the operators can be provided new, tightly scope credentials to contact the Azure APIs...
These new creds are minted by cloud-credential-operator, check out it's logs.. oc logs -n openshift-cloud-credential-operator deploy/cloud-credential-operator
I may be wrong, need help to get this through
./oc logs ingress-operator-58478cc77f-ckss5 -n openshift-ingress-operator 2019-09-09T18:37:05.575Z INFO operator log/log.go:26 started zapr logger 2019-09-09T18:37:07.497Z INFO operator.entrypoint ingress-operator/main.go:62 using operator namespace {"namespace": "openshift-ingress-operator"} 2019-09-09T18:37:07.514Z ERROR operator.entrypoint ingress-operator/main.go:105 failed to create DNS manager {"error": "failed to get cloud credentials from secret /: secrets "cloud-credentials" not found"}
I have the permission enabled through Azure console as per the documentation for my Service Principle Osp41 since when i ran the cli command it gives out this output, i have also attached the screenshot from azure portal
[oseadmin@osejumpserver /]$ az ad app permission add --id 911f3d60-e8e8-4881-a433-329949270436 --api 00000002-0000-0000-c000-000000000000 --api-permissions 824c81eb-e3f8-4ee6-8f6d-de7f50d565b7=Role Invoking "az ad app permission grant --id 911f3d60-e8e8-4881-a433-329949270436 --api 00000002-0000-0000-c000-000000000000" is needed to make the change effective [oseadmin@osejumpserver /]$ az ad app permission grant --id 911f3d60-e8e8-4881-a433-329949270436 --api 00000002-0000-0000-c000-000000000000 { "clientId": "840bfed3-acb1-42f8-8ae9-5665b5640281", "consentType": "AllPrincipals", "expiryTime": "2020-09-09T19:34:41.997267", "objectId": "0_4LhLGs-EKK6VZltWQCgYXZsX09AdJFjcopS24DevE", "odata.metadata": "https://graph.windows.net/72e0e644-3484-447a-9c89-7530f692cf5f/$metadata#oauth2PermissionGrants/@Element", "odatatype": null, "principalId": null, "resourceId": "7db1d985-013d-45d2-8dca-294b6e037af1", "scope": "user_impersonation", "startTime": "2019-09-09T19:34:41.997267" } [oseadmin@osejumpserver /]$ az ad app permission add --id 911f3d60-e8e8-4881-a433-329949270436 --api 00000002-0000-0000-c000-000000000000 --api-permissions 824c81eb-e3f8-4ee6-8f6d-de7f50d565b7=Role Invoking "az ad app permission grant --id 911f3d60-e8e8-4881-a433-329949270436 --api 00000002-0000-0000-c000-000000000000" is needed to make the change effective [oseadmin@osejumpserver /]$ az ad app permission add --id 911f3d60-e8e8-4881-a433-329949270436 --api 00000002-0000-0000-c000-000000000000 --api-permissions 824c81eb-e3f8-4ee6-8f6d-de7f50d565b7=Role Invoking "az ad app permission grant --id 911f3d60-e8e8-4881-a433-329949270436 --api 00000002-0000-0000-c000-000000000000" is needed to make the change effective [oseadmin@osejumpserver /]$
The API permissions for Active Directory look enough..
do you see errors in credential-operator oc logs -n openshift-cloud-credential-operator deploy/cloud-credential-operator
??
I experienced the same error for ingress operator.
$ ./oc --config ./ocp42/auth/kubeconfig get pod --all-namespaces|grep -i off
openshift-ingress-operator ingress-operator-7c7cb5dfdc-n7m26 0/1 CrashLoopBackOff 13 45m
$ ./oc --config ./ocp42/auth/kubeconfig logs -n openshift-ingress-operator ingress-operator-7c7cb5dfdc-n7m26
2019-09-13T17:06:46.824Z INFO operator log/log.go:26 started zapr logger
2019-09-13T17:06:48.756Z INFO operator.entrypoint ingress-operator/main.go:62 using operator namespace {"namespace": "openshift-ingress-operator"}
2019-09-13T17:06:48.787Z ERROR operator.entrypoint ingress-operator/main.go:105 failed to create DNS manager {"error": "failed to get cloud credentials from secret /: secrets \"cloud-credentials\" not found"}
Installation error.
INFO Waiting up to 30m0s for the cluster at https://api.ocp4.az-devops.org:6443 to initialize...
FATAL failed to initialize the cluster: Some cluster operators are still updating: authentication, console, image-registry, ingress, monitoring
Also no worker node was created.
$ ./oc --config ./ocp42/auth/kubeconfig get nodes
NAME STATUS ROLES AGE VERSION
ocp4-6mdvt-master-0 Ready master 56m v1.14.6+5a523078f
ocp4-6mdvt-master-1 Ready master 56m v1.14.6+5a523078f
ocp4-6mdvt-master-2 Ready master 56m v1.14.6+5a523078f
Pods.
openshift-kube-scheduler openshift-kube-scheduler-ocp4-6mdvt-master-1 1/1 Running 0 52m
openshift-kube-scheduler openshift-kube-scheduler-ocp4-6mdvt-master-2 1/1 Running 0 55m
openshift-kube-scheduler revision-pruner-2-ocp4-6mdvt-master-0 0/1 Completed 0 55m
openshift-kube-scheduler revision-pruner-4-ocp4-6mdvt-master-0 0/1 Completed 0 50m
openshift-kube-scheduler revision-pruner-4-ocp4-6mdvt-master-1 0/1 OOMKilled 0 51m
openshift-kube-scheduler revision-pruner-4-ocp4-6mdvt-master-2 0/1 Completed 0 52m
openshift-machine-api cluster-autoscaler-operator-85c88fcbdf-7z9jk 1/1 Running 0 50m
openshift-machine-api machine-api-controllers-6684f88794-j5kz8 3/3 Running 0 56m
openshift-machine-api machine-api-operator-598fd56f46-q5jdt 1/1 Running 0 57m
openshift-machine-config-operator etcd-quorum-guard-5f8c9b48f8-62zlw 1/1 Running 0 55m
openshift-machine-config-operator etcd-quorum-guard-5f8c9b48f8-f4vc7 1/1 Running 0 55m
openshift-machine-config-operator etcd-quorum-guard-5f8c9b48f8-n2728 1/1 Running 0 55m
openshift-machine-config-operator machine-config-controller-5ddc9cf57-sc29c 1/1 Running 0 56m
openshift-machine-config-operator machine-config-daemon-h7nxr 1/1 Running 0 56m
openshift-machine-config-operator machine-config-daemon-nqkjv 1/1 Running 0 56m
openshift-machine-config-operator machine-config-daemon-v9wnc 1/1 Running 0 56m
openshift-machine-config-operator machine-config-operator-6f9775d7c6-7t2v9 1/1 Running 0 57m
openshift-machine-config-operator machine-config-server-7jtsc 1/1 Running 0 56m
openshift-machine-config-operator machine-config-server-hnkmr 1/1 Running 0 56m
openshift-machine-config-operator machine-config-server-tgj58 1/1 Running 0 56m
openshift-marketplace certified-operators-6757fc8c95-rd2n2 0/1 Pending 0 50m
openshift-marketplace community-operators-764fddfcd7-ptsbl 0/1 Pending 0 50m
openshift-marketplace marketplace-operator-777cb7fd85-2gcp9 1/1 Running 0 51m
openshift-marketplace redhat-operators-74865497dc-mvwn4 0/1 Pending 0 51m
openshift-monitoring cluster-monitoring-operator-655f555fdc-7nt6p 1/1 Running 0 51m
openshift-monitoring kube-state-metrics-57d8c7766b-pww8g 0/3 Pending 0 50m
openshift-monitoring node-exporter-6wlz5 2/2 Running 0 51m
openshift-monitoring node-exporter-v826b 2/2 Running 0 50m
openshift-monitoring node-exporter-zc4kr 2/2 Running 0 50m
openshift-monitoring openshift-state-metrics-84c7f8c5d8-5v4p5 0/3 Pending 0 51m
openshift-monitoring prometheus-adapter-749cdcf9b5-hfq46 0/1 Pending 0 45m
openshift-monitoring prometheus-adapter-749cdcf9b5-sfh6q 0/1 Pending 0 45m
openshift-monitoring prometheus-operator-696c9ddfb4-vq6xv 1/1 Running 0 50m
openshift-monitoring telemeter-client-748475b66-wbnrh 0/3 Pending 0 45m
openshift-monitoring telemeter-client-8f8bdcd7c-f8rd6 0/3 Pending 0 50m
openshift-multus multus-2s5g9 1/1 Running 0 57m
openshift-multus multus-admission-controller-5842x 1/1 Running 0 57m
openshift-multus multus-admission-controller-5dgzh 1/1 Running 0 57m
openshift-multus multus-admission-controller-grmvr 1/1 Running 0 57m
openshift-multus multus-ll9tb 1/1 Running 0 57m
openshift-multus multus-mprf7 1/1 Running 0 57m
openshift-network-operator network-operator-8d9d7ddc5-thh24 1/1 Running 0 57m
openshift-operator-lifecycle-manager catalog-operator-5697fc6c88-tqgfc 1/1 Running 0 57m
openshift-operator-lifecycle-manager olm-operator-df6fddccd-hhdmb 1/1 Running 0 57m
openshift-operator-lifecycle-manager packageserver-8b695d794-65qpb 1/1 Running 0 55m
openshift-operator-lifecycle-manager packageserver-8b695d794-tlvsf 1/1 Running 0 55m
openshift-sdn ovs-2dd9h 1/1 Running 0 57m
openshift-sdn ovs-6v9q6 1/1 Running 0 57m
openshift-sdn ovs-hbghh 1/1 Running 0 57m
openshift-sdn sdn-5mlt6 1/1 Running 1 57m
openshift-sdn sdn-controller-887b4 1/1 Running 0 57m
openshift-sdn sdn-controller-qm2jf 1/1 Running 0 57m
openshift-sdn sdn-controller-srvc7 1/1 Running 0 57m
openshift-sdn sdn-g6f9t 1/1 Running 0 57m
openshift-sdn sdn-qdx4c 1/1 Running 1 57m
openshift-service-ca-operator service-ca-operator-7b4f5bf9f4-xfmhz 1/1 Running 0 57m
openshift-service-ca apiservice-cabundle-injector-5b848f5bc8-rg9kc 1/1 Running 0 56m
openshift-service-ca configmap-cabundle-injector-84bf66575b-kncg4 1/1 Running 0 56m
openshift-service-ca service-serving-cert-signer-5575b77cc4-89rdz 1/1 Running 0 56m
openshift-service-catalog-apiserver-operator openshift-service-catalog-apiserver-operator-675c4ccf8b-xh2cz 1/1 Running 0 52m
openshift-service-catalog-controller-manager-operator openshift-service-catalog-controller-manager-operator-59d79fdmx 1/1 Running 0 52m
@abhinavdahiya Do you have any ideas on this issue? There are couple people are having the same issue for a while.
can you make sure the appID for which you have requested and received the Admin consent matches the one in ~/.azure/osServicePrincipal.json
and the secret in the cluster oc get secret -n kube-system azure-credentials -oyaml
We have a bug report where the user followed the docs and created a new service principal overriding the credentials in the default location, which sadly wouldn't have the permissions from https://github.com/openshift/installer/blob/master/docs/user/azure/credentials.md#step-2-request-permissions-for-the-service-principal-from-tenant-administrator
I tried again today with the latest nightly build of the installer, got the same error again. The SP is correct for both ~/.azure/osServicePrincipal.json and azure-credentials.
$ oc --config /home/nico/ocp0921/test1/auth/kubeconfig log -n openshift-ingress-operator ingress-operator-68d49d478d-zqjfx
2019-09-21T10:44:54.036Z INFO operator log/log.go:26 started zapr logger
2019-09-21T10:44:55.956Z INFO operator.entrypoint ingress-operator/main.go:62 using operator namespace {"namespace": "openshift-ingress-operator"}
2019-09-21T10:44:55.972Z ERROR operator.entrypoint ingress-operator/main.go:105 failed to create DNS manager {"error": "failed to get cloud credentials from secret /: secrets \"cloud-credentials\" not found"}
I tried again today with the latest nightly build of the installer, got the same error again. The SP is correct for both ~/.azure/osServicePrincipal.json and azure-credentials.
can you make sure the appID for which you have requested and received the Admin consent matches the one in
~/.azure/osServicePrincipal.json
and the secret in the clusteroc get secret -n kube-system azure-credentials -oyaml
I'm not sure this is what you meant, but just to be sure the appID for the service principal which has the OwnedBy permission, matches the clientID in the azure-credentials secret
, the ~/.azure/osServicePrincipal.json and azure-credentials.
will always tend to match.
@abhinavdahiya yes, they matched and have the permission. The result was the installation failed.
I am experiencing a similar issue on the openshift-install 4.2 on Ubuntu 18.04.
$ openshift-install version
./openshift-install v4.2.0
built from commit f96afb99f1ce4f8976ce62f7df44acb24d2062d6
release image quay.io/openshift-release-dev/ocp-release-nightly@sha256:b3ba58c53a3f5e98f53dff425e7e4c87b60f5d49d66213853b79f00f7a8a9448
Following the documentation, an initialization of the cluster was attempted with:
./openshift-install create cluster --dir OCP4 --log-level debug
However, the installation times out when initializing the cluster, on the "Waiting up to 30m0s for the cluster" phase.
INFO Waiting up to 30m0s for the cluster at https://api.openshift4.oc-demo.ml:6443 to initialize...
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.nightly-2019-09-23-154647: 99% complete
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.nightly-2019-09-23-154647: 99% complete, waiting on authentication, console, image-registry, ingress, monitoring
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.nightly-2019-09-23-154647: 99% complete
DEBUG Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, console, image-registry, ingress, monitoring
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.nightly-2019-09-23-154647: 99% complete
DEBUG Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, console, image-registry, ingress, monitoring
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.nightly-2019-09-23-154647: 99% complete
DEBUG Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, console, image-registry, ingress, monitoring
DEBUG Still waiting for the cluster to initialize: Working towards 4.2.0-0.nightly-2019-09-23-154647: 99% complete
DEBUG Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, console, image-registry, ingress, monitoring
FATAL failed to initialize the cluster: Some cluster operators are still updating: authentication, console, image-registry, ingress, monitoring
It looks like initializations/updates of some operators are on endless loops, timing out the process. While the installation was stalled, the following output was produced on a parallel terminal:
$ oc --config ./OCP4/auth/kubeconfig get nodes
NAME STATUS ROLES AGE VERSION
openshift4-6rxh9-master-0 Ready master 16m v1.14.6+c4799753c
openshift4-6rxh9-master-1 Ready master 16m v1.14.6+c4799753c
openshift4-6rxh9-master-2 Ready master 16m v1.14.6+c4799753c
$ oc --config ./OCP4/auth/kubeconfig get co
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
authentication Unknown Unknown True 15m
cloud-credential 4.2.0-0.nightly-2019-09-23-154647 True True True 18m
cluster-autoscaler 4.2.0-0.nightly-2019-09-23-154647 True False False 12m
console 4.2.0-0.nightly-2019-09-23-154647 Unknown True False 14m
dns 4.2.0-0.nightly-2019-09-23-154647 True False False 18m
image-registry False True False 14m
insights 4.2.0-0.nightly-2019-09-23-154647 True False False 18m
kube-apiserver 4.2.0-0.nightly-2019-09-23-154647 True False False 17m
kube-controller-manager 4.2.0-0.nightly-2019-09-23-154647 True False False 15m
kube-scheduler 4.2.0-0.nightly-2019-09-23-154647 True False False 16m
machine-api 4.2.0-0.nightly-2019-09-23-154647 True False False 18m
machine-config 4.2.0-0.nightly-2019-09-23-154647 True False False 17m
marketplace 4.2.0-0.nightly-2019-09-23-154647 True False False 13m
monitoring False True True 8m55s
network 4.2.0-0.nightly-2019-09-23-154647 True False False 17m
node-tuning 4.2.0-0.nightly-2019-09-23-154647 True False False 15m
openshift-apiserver 4.2.0-0.nightly-2019-09-23-154647 True False False 14m
openshift-controller-manager 4.2.0-0.nightly-2019-09-23-154647 True False False 16m
openshift-samples 4.2.0-0.nightly-2019-09-23-154647 True False False 11m
operator-lifecycle-manager 4.2.0-0.nightly-2019-09-23-154647 True False False 17m
operator-lifecycle-manager-catalog 4.2.0-0.nightly-2019-09-23-154647 True False False 17m
operator-lifecycle-manager-packageserver 4.2.0-0.nightly-2019-09-23-154647 True False False 16m
service-ca 4.2.0-0.nightly-2019-09-23-154647 True False False 18m
service-catalog-apiserver 4.2.0-0.nightly-2019-09-23-154647 True False False 15m
service-catalog-controller-manager 4.2.0-0.nightly-2019-09-23-154647 True False False 15m
storage 4.2.0-0.nightly-2019-09-23-154647 True False False 13m
This installation was attempted on several Azure Regions, with every one timming out on the "Waiting up to 30m0s for the cluster" phase.
During the installation, the Master Nodes were successfully instanced and the Bootstrap Node was destroyed, but the Worker Nodes never appeared on the Resource Group (checking on Azure Portal).
@abhinavdahiya Please kindly suggest how should we move forward on this, or we need to escalate this to someone in Red Hat engineering to help out? This is pending for more than 2 weeks. 4.2 is going to be GA soon, if this is a real issue, it's going to affect many OpenShift 4 users on Azure. If this is not an issue, please kindly suggest how we should get around this.
The installer is critical for OpenShift 4 on Azure experience. Escalating for more visibility @smarterclayton. We are also escalating this via internal Red Hat contact point.
@joaotomazio:
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE
...
cloud-credential 4.2.0-0.nightly-2019-09-23-154647 True True True 18m
To understand why an operator like this is degraded, fetch its ClusterOperator (as suggested in our troubleshooting docs:
$ oc --config=${INSTALL_DIR}/auth/kubeconfig get -o yaml clusteroperator cloud-credential
which will give you the cred operator's description for why it is degraded. You can also gather the cred-operator logs (as Abhinav suggested above).
Thanks @wking . @joaotomazio can you get us those logs?
We have a bug report...
Linking https://bugzilla.redhat.com/show_bug.cgi?id=1753419#c5 , since I think that's what @abhinavdahiya was referencing.
@nichochen @joaotomazio we believe this is a credentials issue due to inaccurate/unhelpful docs.
Here is the Github PR where the docs were updated two days ago:
https://github.com/openshift/installer/pull/2388
Can you run through that new permissions flow and see if you can get the cluster up?
Thanks
I've managed to finish the installation successfully by granting a certain permission to the Azure App.
On the portal: Azure AD -> App Registration -> App -> API Permissions -> Delegated Permission on user_impersonation.
I don't know if this is among the best practices, but for demo purposes, it now works just fine!!! Thank you everyone
@nstielau Thanks for the information. @abhinavdahiya mentioned this bug last Saturday, and I have verified that the sp I used in my installation was the one with the admin consent, last Saturday, see here.
@joaotomazio Awesome!
@abhinavdahiya Could you kindly confirm that the permission user_impersonation
is required? Appreciate for your clarification and help.
@abhinavdahiya Could you kindly confirm that the permission
user_impersonation
is required? Appreciate for your clarification and help.
cc @dgoodwin @joelddiaz @ingvagabund who are owners of the Azure credential-operator for Azure... hopefully they can shed more light.
fwiw, here's the permissions i've had for the Azure clusters i've installed in the past:
@joelddiaz has provided the permissions we were successful with. I notice that Application.ReadWriteAll is not present under Azure Active Directory Graph in the screenshot in https://github.com/openshift/installer/issues/2334#issuecomment-529634802, but it does appear under Microsoft Graph. I do not know how to interpret this but it doesn't look correct, and the Azure UI is still showing me the version Joel sees with Azure Active Directory Graph -> Application.ReadWriteAll.
We dealt with permissions in the UI, it looks like someone boiled these down to az commands but I'm wondering if a mistake was made and the command is somehow granting the wrong Application.ReadWrite.All permission? (Microsoft Graph, instead of Azure Active Directory Graph)
According to https://blogs.msdn.microsoft.com/aaddevsup/2018/06/06/guid-table-for-windows-azure-active-directory-permissions/
It looks like the API ID that was missing is: 1cda74f2-2616-4834-b122-5cb1b07f8a59 Read and write all applications
This does not appear in our docs.
@joelddiaz clarified on scrum call this morning.
Microsoft Graph -> Application.ReadWrite.All is the new style permission. Azure Active Directory Graph -> Application.ReadWrite.All is the legacy one. However the gosdk used in the cred operator requires the legacy permission.
It would appear the translation to CLI commands to request the permissions got the wrong UUID for the API. This explains why adding the impersonate permission got past as this is likely a part of the legacy ReadWrite.All.
@dgoodwin I'm not clear on the next step. Do we need to update our docs again?
I went through the instructions at https://github.com/openshift/installer/blob/master/docs/user/azure/credentials.md , and I was able to deploy a cluster out to Azure.
The extra creds that @dgoodwin mentions above would bring the list of permissions up to the level of what he and I have both been using previously, but those extra permissions appear to be unnecessary.
These perms:
plus adding the Application Registration as a Contributor and User Access Administrator into the Subscription being installed into are enough permissions to get a cluster up and running.
TL;DR: the docs appear to be okay.
The installer doesn't create the worker directly, but the cluster operator machine-api-operator using the cluster-api creates the workers.
try to see the logs for the controller in openshift-machine-api and the machine objects.
also make sure the service principal is correctly setup https://github.com/openshift/installer/blob/master/docs/user/azure/credentials.md#step-2-request-permissions-for-the-service-principal-from-tenant-administrator
Hi Abhinav,
Need one more additional information related to OpenShift 4.1, do you have binaries that i can use to deploy UPI version for one of the POC requirements from Client, Since OCP 4.2 is not GA yet we have a requirement from one of our client to do a POC in coming weeks and having trouble find the 4.1 binaries for Azure.
Thanks Jay
And to come full circle, I also followed the instructions at https://github.com/openshift/openshift-docs/blob/master/modules/installation-azure-service-principal.adoc .
This also worked (without the delegated Microsoft Graph User.Read permissions).
I tried installing with 4.2 on Azure today with the GA release, the installation finished without error. The cluster got deployed successfully onto Azure. Appreciate for the great work!
/close
I tried installing with 4.2 on Azure today with the GA release, the installation finished without error. The cluster got deployed successfully onto Azure. Appreciate for the great work!
@abhinavdahiya: Closing this issue.
Hi Guys,
I will try to explain step by step what I did so far and hope that will clarify my issue.
So, I followed this pre-steps from Openshift Documentation
I created and configured successfully the Service Principle and even I checked if it has the right permission.
I checked if in Azure Active Directory looks all good... I even gave more permission as per this GitHub issue... #2334
I even have full access to my Service Principal on Subscription level (Owner, contributor, administrator) Following the next stage for 4.2 Installing a cluster quickly on Azure
I managed to deploy successfully the "Default Openshift cluster" which proves that my Service Principle is configured and it's working as expected.
DEBUG OpenShift console route is created
INFO Install complete!
INFO Access the OpenShift web-console here: https:......
Unfortunately, My task is to create a Customised Openshift cluster because I have to deploy Cloud Pak for Integration, which requires a very high configuration. so I followed these steps Installing a cluster on Azure with customizations
I used the same Service Principle, I increased the quotas for CPU for "Dsv3-series" but I am getting time out on deploying Openshift (stuck at 99%) and when I checked the Resource group in Azure UI there is only 3 Master Nodes but no Worker nodes at all!
here are my install-config.yaml values:
apiVersion: v1
baseDomain: poc-*****
compute:
- architecture: amd64
hyperthreading: Enabled
name: worker
platform:
azure:
type: Standard_D16s_v3
osDisk:
diskSizeGB: 512
zones:
- "1"
- "2"
- "3"
replicas: 3
controlPlane:
architecture: amd64
hyperthreading: Enabled
name: master
platform:
azure:
osDisk:
diskSizeGB: 512
type: Standard_D18s_v3
replicas: 3
metadata:
creationTimestamp: null
name: sec-****-**
networking:
clusterNetwork:
- cidr: 10.128.0.0/14
hostPrefix: 23
machineCIDR: 10.0.0.0/16
networkType: OpenShiftSDN
serviceNetwork:
- 172.30.0.0/16
platform:
azure:
baseDomainResourceGroupName: dns
region: ukwest
pullSecret: '{"auths":{"******'
sshKey: |
ssh-rsa *******
maybe I did something wrong?
The same issue as described here on this page: OpenShift Installer is installing only the master nodes and no worker nodes are getting deployed I tried everything but still can't manage to deploy a configurable Cluster.
I hope you could help to understand my issue guys!
Looks like Azure doesn't have Availability Zones for region: ukwest
https://azure.microsoft.com/en-us/global-infrastructure/regions/
and if you specify it in your install-config.yaml
as
zones:
- "1"
- "2"
- "3"
will fail with time-out at 99%
I tried with region: uksouth
and did work!
It will be great to make a little warning in DOCS!
I don't really get that ?! Does the installer only support Azure Regions with AZs or not ? From my tests it looks like I can only deploy to region with an AZ, others would fail.
However the docs say the installer is tested in several regions that do not have Availability Zones https://docs.openshift.com/container-platform/4.3/installing/installing_azure/installing-azure-account.html#installation-azure-regions_installing-azure-account
So can somebody let me know, if and how we can deploy to an Azure Region that has no Availability Zones?
If you don't set any zones, the installer will pick the zones for a region when they are available.. and skip using zones when they are not.
If a user explicitly sets zones for a region when there are none, thats when workers fail to come up, because of the user error
So this is not about if we support that region.. but more to do with user misconfiguration..
Maybe we help warn or fail when user sets such a invalid configuration... Not sure how important or useful that's going to be...
Weird,
I have tried that the other day with OCP 4.3.5 deploying a private Cluster to an existing VNet in Germany West Central. My install-config.yaml definitely had no zone specifications inside. It still failed while trying to deploy the VMs with a message that the region does not support zones !
Is there some defaults for the masters behind the scenes that we somehow need to override, or did I miss anything else ?
I don't think you missed anything, could be some magic behind the scenes which defaults to use zones always when you deploy a cluster with install-config.yaml,
because I spent 2 days to deploy on region: ukwest
with zones and without them and still had no clue what's going on.
Version
4.1
Platform:
azure
What happened?
See the troubleshooting documentation for ideas about what information to collect. For example, if the installer fails to create resources, attach the relevant portions of your
.openshift_install.log
.What you expected to happen?
Expected to have a Dev Preview OpenShift 4.1 cluster installed and running in Azure as per documented steps.
How to reproduce it (as minimally and precisely as possible)?
Just follow the steps in https://cloud.redhat.com/openshift/install/azure/installer-provisioned to reproduce openshift_install.log
Please list the full steps required to reproduce the issue. -->
Anything else we need to know?
Enter text here.
References