nutanix-cloud-native / cluster-api-provider-nutanix

Kubernetes-native declarative infrastructure provider for Nutanix AHV
https://opendocs.nutanix.com/capx/latest/getting_started/
Apache License 2.0
40 stars 23 forks source link

Make `providerID` in NutanixMachine spec optional #447

Closed thunderboltsid closed 5 months ago

thunderboltsid commented 5 months ago

Also remove filler providerID from NutanixMachineTemplate. The providerID value is used by CAPI reconciler for events and having a filler one set by default could hamper CAPI processes.

thunderboltsid commented 5 months ago

Need to understand why this fails

$ make template-test
kustomize build templates/base > templates/cluster-template.yaml
kustomize build templates/csi > templates/cluster-template-csi.yaml
kustomize build templates/clusterclass > templates/cluster-template-clusterclass.yaml
kustomize build templates/topology > templates/cluster-template-topology.yaml
GOPROXY=off ginkgo --trace --v run templates
Running Suite: Template Tests Suite - /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates
=====================================================================================================================================
Random Seed: 1717599164

Will run 10 of 10 specs
------------------------------
[BeforeSuite]
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:268
  INFO: clusterctl init --config testdata/clusterctl-init.yaml --kubeconfig /var/folders/g3/lb827bt96z10xz_m_c2xn93w0000gp/T/kubeconfig621058353 --wait-providers --infrastructure nutanix:v1.4.0-alpha.2
failed to create object &{map[apiVersion:infrastructure.cluster.x-k8s.io/v1beta1 kind:NutanixMachineTemplate metadata:map[name:nutanix-quick-start-cp-nmt namespace:default] spec:map[template:map[spec:map[bootType:legacy cluster:map[name: type:name] image:map[name: type:name] memorySize:4Gi subnet:[map[name: type:name]] systemDiskSize:40Gi vcpuSockets:%!s(int64=2) vcpusPerSocket:%!s(int64=1)]]]]}: NutanixMachineTemplate.infrastructure.cluster.x-k8s.io "nutanix-quick-start-cp-nmt" is invalid: spec.template.spec.providerID: Required value

failed to create object &{map[apiVersion:infrastructure.cluster.x-k8s.io/v1beta1 kind:NutanixMachineTemplate metadata:map[name:nutanix-quick-start-md-nmt namespace:default] spec:map[template:map[spec:map[bootType:legacy cluster:map[name: type:name] image:map[name: type:name] memorySize:4Gi subnet:[map[name: type:name]] systemDiskSize:40Gi vcpuSockets:%!s(int64=2) vcpusPerSocket:%!s(int64=1)]]]]}: NutanixMachineTemplate.infrastructure.cluster.x-k8s.io "nutanix-quick-start-md-nmt" is invalid: spec.template.spec.providerID: Required value

[BeforeSuite] PASSED [69.894 seconds]
------------------------------
Cluster Class Template Patches Test Suite patches for failure domains should create the cluster with failure domains
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:283
2024-06-05T16:54:07+02:00   INFO    KubeAPIWarningLogger    Cluster refers to ClusterClass nutanix-quick-start but this object which hasn't yet been reconciled. Cluster topology has not been fully validated.
• [4.028 seconds]
------------------------------
Cluster Class Template Patches Test Suite patches for failure domains NutanixCluster should have correct failure domains
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:292
  [FAILED] in [It] - /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:295 @ 06/05/24 16:55:07.674
• [FAILED] [60.008 seconds]
Cluster Class Template Patches Test Suite patches for failure domains [It] NutanixCluster should have correct failure domains
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:292

  [FAILED] Timed out after 60.007s.
  The function passed to Eventually returned the following error:
      <*errors.errorString | 0x140013afce0>:
      no NutanixCluster found
      {
          s: "no NutanixCluster found",
      }
  In [It] at: /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:295 @ 06/05/24 16:55:07.674

  Full Stack Trace
    github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates.init.func1.1.2()
        /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:295 +0x1800
------------------------------
S [SKIPPED] [0.000 seconds]
Cluster Class Template Patches Test Suite patches for failure domains [It] Control Plane NutanixMachineTemplate should not have the cluster and subnets set
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:312

  [SKIPPED] Spec skipped because an earlier spec in an ordered container failed
  In [It] at: /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:312 @ 06/05/24 16:55:07.674

  Full Stack Trace
------------------------------
S [SKIPPED] [0.000 seconds]
Cluster Class Template Patches Test Suite patches for failure domains [It] should delete the cluster
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:320

  [SKIPPED] Spec skipped because an earlier spec in an ordered container failed
  In [It] at: /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:320 @ 06/05/24 16:55:07.674

  Full Stack Trace
------------------------------
S [SKIPPED] [0.000 seconds]
Cluster Class Template Patches Test Suite patches for control plane endpoint kubevip [It] kubevip should have correct control plane endpoint
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:326

  [SKIPPED] Spec skipped because an earlier spec in an ordered container failed
  In [It] at: /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:326 @ 06/05/24 16:55:07.674

  Full Stack Trace
------------------------------
S [SKIPPED] [0.000 seconds]
Cluster Class Template Patches Test Suite patches for project (with name) [It] should have correct project
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:358

  [SKIPPED] Spec skipped because an earlier spec in an ordered container failed
  In [It] at: /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:358 @ 06/05/24 16:55:07.674

  Full Stack Trace
------------------------------
S [SKIPPED] [0.000 seconds]
Cluster Class Template Patches Test Suite patches for project (with uuid) [It] should have correct project
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:376

  [SKIPPED] Spec skipped because an earlier spec in an ordered container failed
  In [It] at: /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:376 @ 06/05/24 16:55:07.674

  Full Stack Trace
------------------------------
S [SKIPPED] [0.000 seconds]
Cluster Class Template Patches Test Suite patches for additional categories [It] should have correct categories
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:394

  [SKIPPED] Spec skipped because an earlier spec in an ordered container failed
  In [It] at: /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:394 @ 06/05/24 16:55:07.675

  Full Stack Trace
------------------------------
S [SKIPPED] [0.000 seconds]
Cluster Class Template Patches Test Suite patches for GPUs [It] should have correct GPUs
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:415

  [SKIPPED] Spec skipped because an earlier spec in an ordered container failed
  In [It] at: /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:415 @ 06/05/24 16:55:07.676

  Full Stack Trace
------------------------------
S [SKIPPED] [0.000 seconds]
Cluster Class Template Patches Test Suite patches for subnets [It] should have correct subnets
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:446

  [SKIPPED] Spec skipped because an earlier spec in an ordered container failed
  In [It] at: /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:446 @ 06/05/24 16:55:07.678

  Full Stack Trace
------------------------------
[AfterSuite]
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:273
[AfterSuite] PASSED [0.493 seconds]
------------------------------

Summarizing 1 Failure:
  [FAIL] Cluster Class Template Patches Test Suite patches for failure domains [It] NutanixCluster should have correct failure domains
  /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:295

Ran 2 of 10 Specs in 134.428 seconds
FAIL! -- 1 Passed | 1 Failed | 0 Pending | 8 Skipped
--- FAIL: TestClusterClassTemplateSuite (134.43s)
FAIL

Ginkgo ran 1 suite in 2m23.4292385s

Test Suite Failed
make: *** [Makefile:313: template-test] Error 1
codecov[bot] commented 5 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 31.74%. Comparing base (d5f36c9) to head (90d24d5).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #447 +/- ## ======================================= Coverage 31.74% 31.74% ======================================= Files 14 14 Lines 1367 1367 ======================================= Hits 434 434 Misses 933 933 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

thunderboltsid commented 5 months ago

Need to understand why this fails

$ make template-test
kustomize build templates/base > templates/cluster-template.yaml
kustomize build templates/csi > templates/cluster-template-csi.yaml
kustomize build templates/clusterclass > templates/cluster-template-clusterclass.yaml
kustomize build templates/topology > templates/cluster-template-topology.yaml
GOPROXY=off ginkgo --trace --v run templates
Running Suite: Template Tests Suite - /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates
=====================================================================================================================================
Random Seed: 1717599164

Will run 10 of 10 specs
------------------------------
[BeforeSuite]
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:268
  INFO: clusterctl init --config testdata/clusterctl-init.yaml --kubeconfig /var/folders/g3/lb827bt96z10xz_m_c2xn93w0000gp/T/kubeconfig621058353 --wait-providers --infrastructure nutanix:v1.4.0-alpha.2
failed to create object &{map[apiVersion:infrastructure.cluster.x-k8s.io/v1beta1 kind:NutanixMachineTemplate metadata:map[name:nutanix-quick-start-cp-nmt namespace:default] spec:map[template:map[spec:map[bootType:legacy cluster:map[name: type:name] image:map[name: type:name] memorySize:4Gi subnet:[map[name: type:name]] systemDiskSize:40Gi vcpuSockets:%!s(int64=2) vcpusPerSocket:%!s(int64=1)]]]]}: NutanixMachineTemplate.infrastructure.cluster.x-k8s.io "nutanix-quick-start-cp-nmt" is invalid: spec.template.spec.providerID: Required value

failed to create object &{map[apiVersion:infrastructure.cluster.x-k8s.io/v1beta1 kind:NutanixMachineTemplate metadata:map[name:nutanix-quick-start-md-nmt namespace:default] spec:map[template:map[spec:map[bootType:legacy cluster:map[name: type:name] image:map[name: type:name] memorySize:4Gi subnet:[map[name: type:name]] systemDiskSize:40Gi vcpuSockets:%!s(int64=2) vcpusPerSocket:%!s(int64=1)]]]]}: NutanixMachineTemplate.infrastructure.cluster.x-k8s.io "nutanix-quick-start-md-nmt" is invalid: spec.template.spec.providerID: Required value

[BeforeSuite] PASSED [69.894 seconds]
------------------------------
Cluster Class Template Patches Test Suite patches for failure domains should create the cluster with failure domains
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:283
2024-06-05T16:54:07+02:00 INFO    KubeAPIWarningLogger    Cluster refers to ClusterClass nutanix-quick-start but this object which hasn't yet been reconciled. Cluster topology has not been fully validated.
• [4.028 seconds]
------------------------------
Cluster Class Template Patches Test Suite patches for failure domains NutanixCluster should have correct failure domains
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:292
  [FAILED] in [It] - /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:295 @ 06/05/24 16:55:07.674
• [FAILED] [60.008 seconds]
Cluster Class Template Patches Test Suite patches for failure domains [It] NutanixCluster should have correct failure domains
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:292

  [FAILED] Timed out after 60.007s.
  The function passed to Eventually returned the following error:
      <*errors.errorString | 0x140013afce0>:
      no NutanixCluster found
      {
          s: "no NutanixCluster found",
      }
  In [It] at: /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:295 @ 06/05/24 16:55:07.674

  Full Stack Trace
    github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates.init.func1.1.2()
      /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:295 +0x1800
------------------------------
S [SKIPPED] [0.000 seconds]
Cluster Class Template Patches Test Suite patches for failure domains [It] Control Plane NutanixMachineTemplate should not have the cluster and subnets set
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:312

  [SKIPPED] Spec skipped because an earlier spec in an ordered container failed
  In [It] at: /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:312 @ 06/05/24 16:55:07.674

  Full Stack Trace
------------------------------
S [SKIPPED] [0.000 seconds]
Cluster Class Template Patches Test Suite patches for failure domains [It] should delete the cluster
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:320

  [SKIPPED] Spec skipped because an earlier spec in an ordered container failed
  In [It] at: /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:320 @ 06/05/24 16:55:07.674

  Full Stack Trace
------------------------------
S [SKIPPED] [0.000 seconds]
Cluster Class Template Patches Test Suite patches for control plane endpoint kubevip [It] kubevip should have correct control plane endpoint
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:326

  [SKIPPED] Spec skipped because an earlier spec in an ordered container failed
  In [It] at: /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:326 @ 06/05/24 16:55:07.674

  Full Stack Trace
------------------------------
S [SKIPPED] [0.000 seconds]
Cluster Class Template Patches Test Suite patches for project (with name) [It] should have correct project
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:358

  [SKIPPED] Spec skipped because an earlier spec in an ordered container failed
  In [It] at: /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:358 @ 06/05/24 16:55:07.674

  Full Stack Trace
------------------------------
S [SKIPPED] [0.000 seconds]
Cluster Class Template Patches Test Suite patches for project (with uuid) [It] should have correct project
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:376

  [SKIPPED] Spec skipped because an earlier spec in an ordered container failed
  In [It] at: /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:376 @ 06/05/24 16:55:07.674

  Full Stack Trace
------------------------------
S [SKIPPED] [0.000 seconds]
Cluster Class Template Patches Test Suite patches for additional categories [It] should have correct categories
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:394

  [SKIPPED] Spec skipped because an earlier spec in an ordered container failed
  In [It] at: /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:394 @ 06/05/24 16:55:07.675

  Full Stack Trace
------------------------------
S [SKIPPED] [0.000 seconds]
Cluster Class Template Patches Test Suite patches for GPUs [It] should have correct GPUs
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:415

  [SKIPPED] Spec skipped because an earlier spec in an ordered container failed
  In [It] at: /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:415 @ 06/05/24 16:55:07.676

  Full Stack Trace
------------------------------
S [SKIPPED] [0.000 seconds]
Cluster Class Template Patches Test Suite patches for subnets [It] should have correct subnets
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:446

  [SKIPPED] Spec skipped because an earlier spec in an ordered container failed
  In [It] at: /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:446 @ 06/05/24 16:55:07.678

  Full Stack Trace
------------------------------
[AfterSuite]
/Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:273
[AfterSuite] PASSED [0.493 seconds]
------------------------------

Summarizing 1 Failure:
  [FAIL] Cluster Class Template Patches Test Suite patches for failure domains [It] NutanixCluster should have correct failure domains
  /Users/sid.shukla/go/src/github.com/nutanix-cloud-native/cluster-api-provider-nutanix/templates/template_test.go:295

Ran 2 of 10 Specs in 134.428 seconds
FAIL! -- 1 Passed | 1 Failed | 0 Pending | 8 Skipped
--- FAIL: TestClusterClassTemplateSuite (134.43s)
FAIL

Ginkgo ran 1 suite in 2m23.4292385s

Test Suite Failed
make: *** [Makefile:313: template-test] Error 1

The template-test needed to be modified to work with latest controller templates & manifests instead of last release

thunderboltsid commented 5 months ago

/retest

thunderboltsid commented 5 months ago

/retest

dlipovetsky commented 5 months ago

Thanks for the providerID fix.

Would you mind moving everything not related to it to a separate PR, or commit? It would help us understand the changes better in the future.

thunderboltsid commented 5 months ago

/hold Setting the PR on hold to block merge until https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/pull/448 merges first

thunderboltsid commented 5 months ago

Thanks for the providerID fix.

Would you mind moving everything not related to it to a separate PR, or commit? It would help us understand the changes better in the future.

The way our template-tests were wired up, they were using an older version of the CRDs (from 1.4.0-alpha2) and as a result our CI fails with the changes in this PR as the changes in this PR include a CRD change. The template-test changes were rewiring the template tests to always use the latest local CRDs and controller instead. They have been moved to a separate PR. This PR is now blocked on that PR merging first.

nutanix-cn-prow-bot[bot] commented 5 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: adiantum, dlipovetsky, thunderboltsid

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/nutanix-cloud-native/cluster-api-provider-nutanix/blob/main/OWNERS)~~ [adiantum,thunderboltsid] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
thunderboltsid commented 5 months ago

/retest

thunderboltsid commented 5 months ago

/retest