microsoft / azure_arc

Automated Azure Arc, Edge, and Platform environments
https://aka.ms/ArcJumpstart
Creative Commons Attribution 4.0 International
745 stars 551 forks source link

PostgreSQL Hyperscale ARM Template #765

Closed Permander closed 3 years ago

Permander commented 3 years ago

Scenario which you are working on https://azurearcjumpstart.io/azure_arc_jumpstart/azure_arc_data/aks/aks_postgresql_hyperscale_arm_template/

Describe the bug First phase of automation is completed without any bug but after RDP to client VM DataServicesLogonScript PowerShell logon script has started executing and got failed while creating custom location. In the script it's mentioned: az customlocation create --name 'jumpstart-cl' --resource-group $env:resourceGroup --namespace arc --host-resource-id $connectedClusterId --cluster-extension-ids $extensionId

error which I have received is:- "Deployment failed. Correlation ID: b96f3532-2453-46be-a6f8-3c8de785cc0e. "Microsoft.ExtendedLocation" resource provider does not have the required permissions to create a namespace on the cluster. Refer to https://aka.ms/ArcK8sCustomLocationsDocsEnableFeature to provide the required permissions to the resource provider." After investing further, I have found that we need to pass --kubeconfig parameter for non-AAD enabled Cluster. Then I have checked my AKS cluster which has been created as a part of first phase of automation and it was non-AAD enabled Cluster. Then I have passed Admin Kubeconfig of Cluster and then this issue got resolved. az customlocation create --name 'jumpstart-cl' --resource-group $env:resourceGroup --namespace arc --host-resource-id $connectedClusterId --cluster-extension-ids $extensionId --kubeconfig

To Reproduce Steps to reproduce the behavior:

  1. Go to https://azurearcjumpstart.io/azure_arc_jumpstart/azure_arc_data/aks/aks_postgresql_hyperscale_arm_template/
  2. execute: az group create --name --location az deployment group create \ --resource-group \ --name \ --template-uri https://raw.githubusercontent.com/microsoft/azure_arc/main/azure_arc_data_jumpstart/aks/arm_template/azuredeploy.json \ --parameters <The azuredeploy.parameters.json parameters file location>
  3. Login to client VM
  4. See error during custom location creation step

Expected behavior A clear and concise description of what you expected to happen.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.

mdrakiburrahman commented 3 years ago

@polichtm thank you for raising this, and for recommending the fix in your commit.

At first, I was not able to reproduce this on multiple Azure Subscriptions on our MSFT Tenant, but when I tested on an Azure Subscription on a personal tenant, this was reproducible: image

Root Cause

This interesting behavior lead me to dig further into the root cause.

The reason this is not reproducible in MSFT tenant, but reproducible outside MSFT tenants is because this line of code is passing in the --custom-locations-oid that is specific only to the MSFT tenant: image image

My understanding is, the reason this ObjectID was hardcoded was to allow the Automation SP onboard the Arc Cluster end-to-end in Direct Connected mode without requiring human intervention.

In reality Customers can use the steps here to grab the unique ObjectID for the Custom Locations RP in their AAD tenant - but in case of Jumpstart, we can't query the AAD tenant with the Client SP because it doesn't have AAD level permissions (only Subscription level Contributor). image

Possible solutions

Given this, my understanding is, there's 3 routes to solve it:

  1. Shortcut route: As in your commit here and our docs here - pass in the local --kubeconfig path, which does allow the Custom Location to get created correctly by bypassing the incorrect oid: image

I think this route should work for all Azure K8s scenarios since the kubeconfig is available in the Client VM, but I'm not sure if this will work on EKS and GKE.

  1. "Proper" route: As a pre-requisite to the Jumpstart ARM templates, have users run the following commands in their own AAD Tenant, and pass this correct ObjectID in as part of the ARM template. This allows us to pass in the correct ObjectID in this line, which allows us to onboard the K8s cluster via the Automation SP, without having the user perform manual intervention post-deployment. image

  2. "Proper" route 2: We could ask users elevate the Automation SP to have AAD querying permissions as well (so it can grab the ObjectID by itself inside the ClientVM). This is more invasive than 2 since it exposes unnecessary permissions to the SP.

Next Steps

@polichtm - since you've figured out 1. Shortcut route already, please continue using it to unblock your exploration. @likamrat @dkirby-ms - let's discuss 2 & 3 since it's a fairly major change across all of our scenarios (including Arcbox etc.).

dkirby-ms commented 3 years ago

@mdrakiburrahman I have just tested the "shortcut" method on ArcBox with success. The specific changes are in the arcbox_customlocationfix branch and follow your pattern of dropping the --custom-locations-oid parameter from the k8s cluster onboarding, and passing the local kubeconfig file during custom location create with --kubeconfig parameter.

I am going to merge the ArcBox fix but the same fix I think will work in other data services scenarios but I have not tested them.