weaveworks / vscode-gitops-tools

GitOps Visual Studio Code Extension
Mozilla Public License 2.0
224 stars 22 forks source link

Unable to delete HelmRelease #369

Open CapKenR opened 2 years ago

CapKenR commented 2 years ago

Expected behaviour

I right-click on a HelmRelease under Workloads in the extension. I expect the selected HelmRelease to be deleted from my Kubernetes cluster.

Actual behaviour

Instead, I get a message that says "Delete HelmRelease not supported on Azure cluster."

Steps to reproduce

I used Tanzu Kubernetes Grid to install a Kubernetes (v1.22.9+vmare.1) cluster in Azure. I then bootstrapped Flux in the cluster and added a couple of simple HelmRepository and HelmRelease resources (Traefik and Sealed-Secrets in this case) via kubectl. Finally, tried to delete the HelmRelease resources via the extension.

Versions

kubectl client v1.22.9+vmware.1 kubectl server v1.22.9+vmware.1 Flux: v0.31.1 Git: 2.25.1 Azure: 2.37.0 Azure extension "k8s-configuration": not installed Azure extension "k8s-extension": not installed VSCode: 1.71.0-insider Extension: 0.20.1659723293 OS: Linux x64 5.15.57.1-microsoft-standard-WSL2

Note: I'm not sure what the Azure extensions are or why they'd be needed. I don't need them from a command line.

kingdonb commented 2 years ago

Thanks for the report. We plan to address several outstanding Azure issues in our next sprint! This one is on our radar now.

kingdonb commented 2 years ago

If you are not using the Azure Microsoft.Flux extension then you should be able to set the cluster provider to Generic in the right click menu for the cluster. This should be auto-detected but unfortunately there is a bug, or we haven't considered the cases where you are on Azure but you aren't using Azure extensions, just Azure compute nodes with some other Kubernetes distribution than AKS, and a generic Flux.

@juozasg

CapKenR commented 2 years ago

When I changed the cluster provider to generic I was able to delete the HelmRelease I wasn't able to delete before.

CapKenR commented 2 years ago

I'm guessing that it assumed it was an AKS cluster because the API endpoint was ...westus2.cloudapp.azure.com:6443.

kingdonb commented 2 years ago

You're not using Azure Arc or anything – this is a generic cluster with no Microsoft extensions at all – got it 👍

There should be three choices, this one Generic and also:

We are aware of some autodetection logic issues and will take this case into account when we do a release to cover this issue. When you set the cluster to Generic, does it retain this setting? We'll check on this as well...

Here is the logic that selects the AKS cluster provider, does it look like this makes sense? https://github.com/weaveworks/vscode-gitops-tools/blob/6c1d07c6ae398ed6fc35e689b4e727a9dd5e9f5c/src/kubernetes/kubernetesTools.ts#L491-L492

I think you can check on the node details and find out if this azure:// is the prefix of your nodes providerID in kubectl get nodes -oyaml like this check?

If so, then we need to find a better way to detect managed AKS clusters. Edit: I guess this makes sense. Even if you are running generic Kubernetes, you'd still be using cloud-provider-azure because the platform provides PV support and other features that are integrated with the Kubernetes API like Load Balancers, and Tanzu surely integrates with them. (But it's still not actual AKS distro so no AKS extensions then, I guess?)

CapKenR commented 2 years ago

The logic in the code isn't sufficient to determine if it's AKS or not. The provider ID for my nodes matches.

  providerID: azure:///subscriptions/a0b0c0d0-e1f1-a2b2-c3d3-abcdef123456/resourceGroups/kdr-tkg/providers/Microsoft.Compute/virtualMachines/tkg-workload-azure-md-0-2qm74

The only thing I can see in the node manifest that might tell you it's a TKG cluster and not an AKS cluster is kubeProxyVersion and kubeletVersion. But I don't know if that's sufficient and I don't (right now) have an AKS cluster to compare it with.

    kubeProxyVersion: v1.22.9+vmware.1
    kubeletVersion: v1.22.9+vmware.1
CapKenR commented 2 years ago

On your other question, at least so far it has kept its Generic designation.

CapKenR commented 2 years ago

@kingdonb, in looking at the nodes in an AKS cluster, you might be better off checking the series of kubernetes.azure.com/* labels that are applied to the AKS nodes. I checked 3 AKS clusters and they all had them. Here's an example.

metadata:
  labels:
    kubernetes.azure.com/agentpool: agentpool
    kubernetes.azure.com/cluster: MC_rg-test-eastus-dev_aks-test-eastus-dev_eastus
    kubernetes.azure.com/kubelet-identity-client-id: 1a2b3c4d-a1b2-c3d4-5e6f-123abc456def
    kubernetes.azure.com/mode: system
    kubernetes.azure.com/node-image-version: AKSUbuntu-1804gen2containerd-2022.06.13
    kubernetes.azure.com/os-sku: Ubuntu
    kubernetes.azure.com/role: agent
    kubernetes.azure.com/storageprofile: managed
    kubernetes.azure.com/storagetier: Premium_LRS
kingdonb commented 1 year ago

We haven't re-tested deleting HelmReleases on Azure AKS. I don't think this has been addressed. Can you say whether it's still an issue for you?

kingdonb commented 1 year ago

I'm not sure if we really want to support create and delete resources from the context menu as a pattern, the extension should drive people to using GitOps, which means you'll be creating and deleting the resources in the Git repository.

But if you're on AKS/Arc, even that isn't for sure because Microsoft.Flux uses FluxConfig which has its source of truth in the Azure API. Not really sure how we can promote a consistent message across all platforms in light of this. Flux sees Git as the single source of truth, and Flux Bootstrap works completely different than Azure API in this case... sort of intersectionally related to your issue, but also sorta not really.