weaveworks / vscode-gitops-tools

GitOps Visual Studio Code Extension
Mozilla Public License 2.0
225 stars 22 forks source link

Fails to enable GitOps on AKS cluster #234

Open ComeChao opened 2 years ago

ComeChao commented 2 years ago

Expected behaviour

Expected GitOps to be enabled in the cluster. Expected workflow similar to the one presented in Simplify GitOps with Flux and Visual Studio Code (https://www.youtube.com/watch?v=-07emkW8eiM) by Geert Baeke.

Actual behaviour

After installing the Weaveworks GitOps for VSCODE, selecting a cluster to Enable GitOps, clicking Enable GitOps, and clicking the Enable button on the "Do you want to enable GitOps on the <..> cluster?", I get one error and an input box request: a. error image b. textbox requesting the cluster resource group image

Regardless of entering the resource group, the only activity I see on the Output (GitOps) is: image

Steps to reproduce

See above.

Versions

kubectl client version: 1.22.5 kubectl server version: 1.20.9 Flux version: 0.27.4 Git version: 2.35.1.windows.2 Azure version: 2.34.1 Extension version: 0.19.0 VSCode version: 1.65.2 Operating System (OS) and its version: Windows_NT x64 10.0.019042

image

a1tan commented 2 years ago

I have the very same problem too. It ends up with below error. I have also tried with WSL and Kubernetes versions 1.21.9 and 1.22.6, result is the same. image

a1tan commented 2 years ago

I have upgraded my extension to v0.19.1 and given it a try. After doing this it showed below error which contains the detail of the problem now. image

After that I realized, AKS has to be created with MSI to run microsoft.flux extension which is not the case for the ones created by Azure Portal I guess. In below url it says enable AKS-Extension and I did that. https://docs.microsoft.com/en-us/azure/azure-arc/kubernetes/tutorial-use-gitops-flux2#for-azure-kubernetes-service-clusters

After some research I encountered with below url. https://docs.microsoft.com/en-us/azure/aks/use-managed-identity

Firstly, I have tried to update my existing cluster with this command. Command worked but extension didn't work again.

az aks update -g <RGName> -n <AKSName> --enable-managed-identity

Then I have created a brand new AKS cluster with below command as mentioned on the above link.

az aks create -g myResourceGroup -n myManagedCluster --enable-managed-identity

It finally worked with newly created MSI Cluster. In summary, it seems AKS-ExtensionManager has to be enabled then an MSI AKS cluster has to be created before using Vs Code GitOps Tools.

juozasg commented 2 years ago

So a fix for this would be to provide information warning that in AKS, MIS must be enabled before Enable GitOps (install flux into cluster)?

kingdonb commented 2 years ago

I'm looking for the place in VScode where that output comes, (I've been able to reproduce the issue on my own AKS cluster that was provisioned through the portal, not sure if it was with or without managed identity enabled?)

It looks like I found the messages you are talking about after several successive attempts to enable Flux on my AKS cluster.

The az feature register --namespace Microsoft.ContainerService --name AKS-ExtensionManager command must succeed before later steps will be able to pass. This one itself has several pre-dependencies and from what I can tell, the UI might be providing the right hints to show how to get over these hurdles already, if you know where to find those messages, but each step takes more than a few seconds to process at Azure-side and the interfaces to wait for those events to complete are less than straightforward now from a UX perspective, emitting warnings in the terminal, with links to docs that are all helpful but only providing methods to check on the status of progress where the UX is a huge blob of JSON.

The issue:

is related, because you should be able to install plain Flux without any extension on AKS clusters.

But assuming that you really did want the AKS Flux module, I think we can do better in terms of hand-holding these errors to let you know if you have found the documentation, which issue is yours now, and how much further you have to go, or if the cluster was created with the wrong mode and needed to have a different mode... as far as progress and reporting progress along the way, I'm not sure where this error comes from, but I think we always can try to do better than this for users:

Screen Shot 2022-04-14 at 10 33 50 AM

This happens at the start of the input collection regarding "which cluster, subscription, and resource group is it" – this might explain #218 if I could see the source of this error, so far I haven't figured it out.

I did eventually get the AKS extension enabled by following the docs links and following the prompts

These were all relevant docs links:

https://docs.microsoft.com/en-us/azure/azure-arc/kubernetes/tutorial-use-gitops-flux2#for-azure-kubernetes-service-clusters

https://docs.microsoft.com/en-us/azure/azure-arc/kubernetes/quickstart-connect-cluster?tabs=azure-cli#register-providers-for-azure-arc-enabled-kubernetes

After following these steps, which were all from links I found in the Output error report after things had failed, I still had:

Message:  Request failed to https://management.azure.com/subscriptions/ZZZZ/resourceGroups/aks-kingdon/providers/Microsoft.ContainerService/managedclusters/aks-kingdon-az1/extensionaddons/flux?api-version=2021-03-01. Error code: Forbidden. Reason: Forbidden.{"error":{"code":"AuthorizationFailed","message":"The client 'XXXX' with object id 'YYYY' does not have authorization to perform action 'Microsoft.ContainerService/managedclusters/extensionaddons/read' over scope '/subscriptions/ZZZZ/resourceGroups/aks-kingdon/providers/Microsoft.ContainerService/managedclusters/aks-kingdon-az1/extensionaddons/flux' or the scope is invalid. If access was recently granted, please refresh your credentials."}}

I am working with my corp-it to get that resolved 😅 then I should be able to better reproduce and diagnose issues like this. Meanwhile, hopefully these docs will help someone else who runs into this first issue.

kingdonb commented 2 years ago

I think this is blocked for us.

We are blocking at least #232 and #234 – we need to have an AKS account with a path for both: functioning Azure Arc clusters and AKS clusters with the microsoft.flux extension. We currently have access to neither. I can't believe this circumstance is unique to our two unrelated Azure accounts, there must be some important account onboarding documentation or delegated permission structure that we're still missing.

I will try this again myself, on a new Azure Trial account that I'll create for myself before 0.19.3.

kingdonb commented 2 years ago

*We did figure out what was missing in common from both accounts, there is an instruction in the section "Azure Specific Recommendations" which states:

In order to enable GitOps in a cluster you will likely need the --admin credentials.

The default kubeconfig emitted by az cli is not an --admin unless you pass in that flag. It works OK if you remember that instruction. We will have to make sure this gets a nice call-out in the docs (bigger than h1 I guess, because both of us missed it)

kingdonb commented 2 years ago

Delaying this for 0.20.0 – I'll have to take a look at Azure issues again once the extension is published into the marketplace.

kingdonb commented 2 years ago

We plan to address Azure bugs as soon as possible in the 0.21.x series.

kingdonb commented 1 year ago

Hello, we have gone a while without checking in. v0.25.1 delivers significant UX improvements, but we haven't re-tested on Azure AKS. Can you say whether it's working well for you, or if you are still interested in using the VSCode GitOps Tools extension for Flux?