vmware-tanzu / community-edition

VMware Tanzu Community Edition is no longer an actively maintained project. Code is available for historical purposes only.
https://tanzucommunityedition.io/
Apache License 2.0
1.33k stars 306 forks source link

Automatically find the Azure VM image billing plan SKU in E2E test script #2586

Closed karuppiah7890 closed 2 years ago

karuppiah7890 commented 2 years ago

This is based on issue #2574 ( https://github.com/vmware-tanzu/community-edition/issues/2574#issuecomment-974030726 ). This will help ensure that issues like #2574 don't happen

Check https://github.com/vmware-tanzu/community-edition/issues/2574#issuecomment-974030726 and the issue for more details

Goal - to not hard code the VM_IMAGE_BILLING_PLAN_SKU value here - https://github.com/vmware-tanzu/community-edition/blob/438ba11825108fc5281d10d2c198c9085c6e641b/test/azure/deploy-management-and-workload-cluster.sh#L50-L54 and here - https://github.com/vmware-tanzu/community-edition/blob/438ba11825108fc5281d10d2c198c9085c6e641b/test/azure/deploy-standalone-cluster.sh#L52

We could also not hard code VM_IMAGE_PUBLISHER, VM_IMAGE_OFFER but that's less likely to change, unless the user used some Tanzu Framework config parameters to override the Azure image to be used -

https://github.com/vmware-tanzu/tanzu-framework/blob/a7a7a15b203cdcd313a64f91cccde85f9662a09b/pkg/v1/tkg/constants/config_variables.go#L71-L80

karuppiah7890 commented 2 years ago

We could probably use the dry run feature of tanzu managment-cluster create and get the Azure VM Image that's going to be used

Related: #1291 which talks about dry run feature in general and shows output for Docker management cluster, where you can see DockerMachineTemplate which has customImage: projects-stg.registry.vmware.com/tkg/kind/node:v1.21.2_vmware.1 for both control plane and worker nodes

We could use any one of the following from the dry run output - either the Secret which has config parameters data of Tanzu Framework like AZURE_IMAGE_SKU, AZURE_IMAGE_PUBLISHER. The other thing that we can use is - the AzureMachineTemplate. Both are internal details, one is tied to Tanzu, another is tied to CAPI and CAPZ. Both can change the way they put their data, Tanzu Framework can change the config parameter name, CAPI and CAPZ can change the way they structure the yaml with newer API versions or have different kinds. I think we can choose AzureMachineTemplate for now, assuming it's less likely to change than depend on Tanzu Framework Secret variable names which could change often too, though that too is a breaking change

Examples for AzureMachineTemplate from the output of tanzu management-cluster create --dry-run test-mc-22521 --file /Users/karuppiahn/projects/github.com/vmware-tanzu/community-edition/test/azure/cluster-config.yaml -v 10 from my local

apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AzureMachineTemplate
metadata:
  name: test-mc-22521-control-plane
  namespace: tkg-system
spec:
  template:
    spec:
      acceleratedNetworking: false
      dataDisks:
      - diskSizeGB: 256
        lun: 0
        nameSuffix: etcddisk
      image:
        marketplace:
          offer: tkg-capi
          publisher: vmware-inc
          sku: k8s-1dot21dot5-ubuntu-2004
          thirdPartyImage: true
          version: 2021.09.21
      location: australiacentral
      osDisk:
        diskSizeGB: 128
        managedDisk:
          storageAccountType: Premium_LRS
        osType: Linux
      sshPublicKey: dummyPublicKey
      vmSize: Standard_D4s_v3
apiVersion: infrastructure.cluster.x-k8s.io/v1alpha3
kind: AzureMachineTemplate
metadata:
  name: test-mc-22521-md-0
  namespace: tkg-system
spec:
  template:
    spec:
      acceleratedNetworking: false
      image:
        marketplace:
          offer: tkg-capi
          publisher: vmware-inc
          sku: k8s-1dot21dot5-ubuntu-2004
          thirdPartyImage: true
          version: 2021.09.21
      location: australiacentral
      osDisk:
        diskSizeGB: 128
        managedDisk:
          storageAccountType: Premium_LRS
        osType: Linux
      sshPublicKey: dummyPublicKey
      vmSize: Standard_D4s_v3

From these two yamls for control plane and worker nodes, we can get k8s-1dot21dot5-ubuntu-2004 SKU

joshrosso commented 2 years ago

Putting this in icebox only because we don't know what release this is targeted for or if it'll be out of band of an "official" release. Feel free to update the milestone if you want to bind yourself to a specific TCE release.

Thanks!

karuppiah7890 commented 2 years ago

There were more E2E test failures due to this automation not being present

https://github.com/vmware-tanzu/community-edition/actions/workflows/e2e-azure-management-and-workload-cluster.yaml

https://github.com/vmware-tanzu/community-edition/runs/5461184983?check_suite_focus=true#step:5:2821 - diagnostics-data-5461184983.zip

Error: unable to set up management cluster: unable to wait for cluster and get the cluster kubeconfig: error waiting for cluster to be provisioned (this may take a few minutes): cluster creation failed, reason:'VMProvisionFailed @ Machine/test-mc-11879-control-plane-fg28n', message:'1 of 2 completed'

https://github.com/vmware-tanzu/community-edition/runs/5462422147?check_suite_focus=true#step:5:2812 - diagnostics-data-5462422147.zip

 Error: unable to set up management cluster: unable to wait for cluster and get the cluster kubeconfig: error waiting for cluster to be provisioned (this may take a few minutes): cluster creation failed, reason:'VMProvisionFailed @ Machine/test-mc-13929-control-plane-tztbx', message:'1 of 2 completed'

https://github.com/vmware-tanzu/community-edition/runs/5467911214?check_suite_focus=true#step:5:2813 - diagnostics-data-5467911214.zip

 Error: unable to set up management cluster: unable to wait for cluster and get the cluster kubeconfig: error waiting for cluster to be provisioned (this may take a few minutes): cluster creation failed, reason:'VMProvisionFailed @ Machine/test-mc-16995-control-plane-44nh9', message:'1 of 2 completed'

https://github.com/vmware-tanzu/community-edition/runs/5474542905?check_suite_focus=true#step:5:2813

Error: unable to set up management cluster: unable to wait for cluster and get the cluster kubeconfig: error waiting for cluster to be provisioned (this may take a few minutes): cluster creation failed, reason:'VMProvisionFailed @ Machine/test-mc-31328-control-plane-zdqpt', message:'1 of 2 completed'

I had to manually accept the license for now for k8s-1dot22dot5-ubuntu-2004 Azure VM image billing plan SKU