microsoft / azure_arc

Automated Azure Arc, Edge, and Platform environments
https://aka.ms/ArcJumpstart
Creative Commons Attribution 4.0 International
739 stars 546 forks source link

ArcBox-CAPI-MGMT Script extension times out #1605

Closed colinrippeyfinarne closed 1 year ago

colinrippeyfinarne commented 1 year ago

Trying to deploy the DevOps scenario by using the simple Deploy to Azure button (thought I had successfully got past my issue from #1604 but now hitting another issue further down the line)

The ubuntuCAPIDeployment fails on the ArcBox-CAPI-MGMT/installscript_CAPI resource:

{ "status": "Failed", "error": { "code": "VMExtensionProvisioningTimeout", "message": "Provisioning of VM extension installscript_CAPI has timed out. Extension provisioning has taken too long to complete. The extension last reported \"Plugin enabled\".\r\n\r\nMore information on troubleshooting is available at https://aka.ms/VMExtensionCSELinuxTroubleshoot" } }

When I ssh onto the box and run the cat jumpstart_logs/installCAPI.log command I see (relevant snippets from the bottom of the log):

Creating Microsoft Defender for Cloud audit secret

% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 5342 100 5342 0 0 20867 0 --:--:-- --:--:-- --:--:-- 20785 secret/audit created

kubeadmconfigtemplate.bootstrap.cluster.x-k8s.io/arcbox-capi-data-2023-md-0 created cluster.cluster.x-k8s.io/arcbox-capi-data-2023 created machinedeployment.cluster.x-k8s.io/arcbox-capi-data-2023-md-0 created kubeadmcontrolplane.controlplane.cluster.x-k8s.io/arcbox-capi-data-2023-control-plane created azurecluster.infrastructure.cluster.x-k8s.io/arcbox-capi-data-2023 created azureclusteridentity.infrastructure.cluster.x-k8s.io/cluster-identity created azuremachinetemplate.infrastructure.cluster.x-k8s.io/arcbox-capi-data-2023-control-plane created azuremachinetemplate.infrastructure.cluster.x-k8s.io/arcbox-capi-data-2023-md-0 created

Waiting for Kubernetes control plane to be in Provisioned phase... Waiting for Kubernetes control plane to be in Provisioned phase... Waiting for Kubernetes control plane to be in Provisioned phase... Waiting for Kubernetes control plane to be in Provisioned phase... Waiting for Kubernetes control plane to be in Provisioned phase...

NAMESPACE NAME PHASE AGE VERSION default arcbox-capi-data-2023 Provisioned 102s

Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes... Waiting for control plane to initialize. This may take a few minutes...

The script extension looks like it times out just after 90 mins. Trying to grok the installCAPI.sh script the line which I "think" is line 202: until sudo kubectl get kubeadmcontrolplane --all-namespaces | grep -q "true"; do echo "Waiting for control plane to initialize. This may take a few minutes..." && sleep 20 ; done If I manually run this command: sudo kubectl get kubeadmcontrolplane --all-namespaces I see: NAMESPACE NAME CLUSTER INITIALIZED API SERVER AVAILABLE REPLICAS READY UPDATED UNAVAILABLE AGE VERSION default arcbox-capi-data-2023-control-plane arcbox-capi-data-2023 1 1 1 115m v1.25.4 arcdemo@ArcBox-CAPI-MGMT:~$ (apologies for the formatting) Any thoughts if I can rescue this deployment?
github-actions[bot] commented 1 year ago

Hey friend! Thanks for opening this issue. We appreciate your contribution and welcome you to our community! We are glad to have you here and to have your input on the Azure Arc Jumpstart.

colinrippeyfinarne commented 1 year ago

Redeployed and working this time.

Previous failed attempt was to North Europe, successful attempt was to West Europe.

Looking back to the pre-requisites again I had not checked the vCPU requirements.

For North Europe I see:

az vm list-skus --location northeurope --size Standard_D2s --all --output table
ResourceType Locations Name Zones Restrictions


virtualMachines northeurope Standard_D2s_v3 1,2,3 NotAvailableForSubscription, type: Zone, locations: northeurope, zones: 1 virtualMachines northeurope Standard_D2s_v4 1,2,3 NotAvailableForSubscription, type: Zone, locations: northeurope, zones: 1 virtualMachines northeurope Standard_D2s_v5 1,2,3 NotAvailableForSubscription, type: Zone, locations: northeurope, zones: 1

For West Europe I see:

az vm list-skus --location westeurope --size Standard_D2s --all --output table ResourceType Locations Name Zones Restrictions


virtualMachines westeurope Standard_D2s_v3 1,2,3 None virtualMachines westeurope Standard_D2s_v4 1,2,3 None virtualMachines westeurope Standard_D2s_v5 1,2,3 None

I am "guessing" that this could be the issue?

I am now waiting on the ArcBox-Client post-logon scripts finishing and will spend time completing the rest of the steps. Once I'm done I will try to redeploy again to North Europe to see if I run into the same issue once more.

colinrippeyfinarne commented 1 year ago

Closing this issue as I was able to deploy the DevOps scenario into a region that had the vCPU capacity required.