vmware-tanzu / tanzu-framework

Tanzu Framework provides a set of building blocks to build atop of the Tanzu platform and leverages Carvel packaging and plugins to provide users with a much stronger, more integrated experience than the loose coupling and stand-alone commands of the previous generation of tools.
Apache License 2.0
197 stars 193 forks source link

Investigate clustergen error message #1130

Closed jayunit100 closed 2 years ago

jayunit100 commented 2 years ago

Bug description

at the end of https://storage.googleapis.com/tkg-clustergen/1127/20211110155525/clustergen.diff.txt , we're seeing an interesting error message.

diff -r -U15 old/96370463.case.output new/96370463.case.output
--- old/96370463.case.output    2021-11-10 15:54:49.047820271 +0000
+++ new/96370463.case.output    2021-11-10 15:49:35.143389357 +0000
@@ -1,10 +1,10 @@

  Creating a windows workload cluster really-long-cluster-name-with-hyphen
  Creating a windows workload cluster really-long-cluster-name-with-hyphen

The error looks like this:

-Error: : unable to get template: Overlaying (in following order: overlay-windows.yaml, vsphere-overlay.yaml, 01_plans/prod.yaml, 02_addons/cni/add_cni.yaml, 03_customizations/03_windows/prevent_windows_updates.yaml, 03_customizations/annotate_os_info.yaml, 03_customizations/registry_ca_cert.yaml): Document on line 03_customizations/03_windows/prevent_windows_updates.yaml:5: Expected number of matched nodes to be 1, but was 3 (lines: base-template.yaml:201, vsphere-overlay.yaml:170, vsphere-overlay.yaml:199)

with the diff being similar:

+Error: : unable to get template: Overlaying (in following order: overlay-windows.yaml, vsphere-overlay.yaml, 01_plans/prod.yaml, 02_addons/cni/add_cni.yaml, 03_customizations/03_windows/deleteme_2.yaml, 03_customizations/03_windows/prevent_windows_updates.yaml, 03_customizations/annotate_os_info.yaml, 03_customizations/registry_ca_cert.yaml): Document on line 03_customizations/03_windows/deleteme_2.yaml:5: Expected number of matched nodes to be 1, but was 3 (lines: base-template.yaml:201, vsphere-overlay.yaml:170, vsphere-overlay.yaml:199)

Am curious wether this "error" is an actual error somewhere in how clustergen runs ? not really sure.

Affected product area (please put an X in all that apply)

Expected behavior

Clustegen tests probably shouldn't have unable to get template: Overlaying (in followi errors ?

Steps to reproduce the bug

1127 <-- this is a simple PR that can repro this

Version (include the SHA if the version is not obvious)

Environment where the bug was observed (cloud, OS, etc)

Relevant Debug Output (Logs, manifests, etc)

jayunit100 commented 2 years ago

@stuartpreston is your hypothesis here that something might be using the wrong YTT ?

vuil commented 2 years ago

The ytt error reported in some overlay/match directive is expecting to match and modify exactly one yaml node but has found more than one. If this is negative test case (i.e. the provided cluster configuration inputs is not supposed to produce a valid cluster manifest any) then it may be fine. Otherwise, there is either some issue in the overlay or that expectation on what the manifest the overlay should be operating on has changed. The way to look into this is to run "make clustergen" locally and examine the testcase in pkg/v1/providers/tests/clustergen/testdata/ from which this error originates

jayunit100 commented 2 years ago

We also need to update windows tests as well, I left a comment in #990 , we can probably close this issue after that mr merges as were solving it there

    filepath.Join(yamlRoot, "config_default.yaml"),
                filepath.Join("./fixtures/tkr-bom-v1.21.1.yaml"),
                filepath.Join("./fixtures/tkg-bom-v1.4.0.yaml"),
                filepath.Join(yamlRoot, "infrastructure-vsphere", "v1.0.1", "ytt", "base-template.yaml"),
                filepath.Join(yamlRoot, "infrastructure-vsphere", "v1.0.1", "ytt", "overlay-windows.yaml"),
                filepath.Join(yamlRoot, "ytt", "02_addons", "cni", "antrea", "antrea_addon_data.lib.yaml"),
                filepath.Join(yamlRoot, "ytt", "02_addons", "cpi", "cpi_addon_data.lib.yaml"),
                filepath.Join(yamlRoot, "ytt", "03_customizations", "02_avi", "ako-deployment.lib.yaml"),
                filepath.Join(yamlRoot, "ytt","03_customizations","03_windows","prevent_windows_updates.yaml"),
                filepath.Join(yamlRoot, "ytt","03_customizations","annotate_os_info.yaml"),
                filepath.Join(yamlRoot, "ytt","03_customizations","registry_skip_tls_verify.yaml"),
                filepath.Join(yamlRoot,"ytt","03_customizations","registry_ca_cert.yaml"),
                filepath.Join(yamlRoot, "ytt"), // lib/helpers.star, lib/config_variable_association.star, lib/validate.star
jayunit100 commented 2 years ago

Experiment 1: running make clustergen on my branch in the attached PR #1127 , I found no diff when testing my PR locally...

so not sure yet what's going on.

jayunit100 commented 2 years ago

Experiment 2: Stuey had an interesting hypothesis that some YTT versions might not behave the same ... going to try 0.31 as the YTT version for make clustergen.

jayunit100 commented 2 years ago

ok, still not seeing a diff... maybe im not running make clustergen properly...


99907723.case (POS) : testcluster --cni calico --controlplane-machine-count 3 --controlplane-size i3.xlarge --namespace test --size t3.medium --tkr v0.0.0+marketplace-image -i azure:v1.
0.0 --plan prod                                                                                                                   
make[2]: Leaving directory '/home/ubuntu/SOURCE/tanzu-framework/pkg/v1/providers'                                                                                                        
Base branch commit for  unchanged, skipping generation of base set....                                                                                                           
~/SOURCE/tanzu-framework/pkg/v1/providers/tests/clustergen/testdata ~/SOURCE/tanzu-framework/pkg/v1/providers                                                                            
diff: old: No such file or directory                                                                                                                                                    
(node:1) ExperimentalWarning: Conditional exports is an experimental feature. This feature could change at any time                                                                      
The input is empty. Try again. []                                                                                                                                                        
Usage: diff2html [options] -- [diff args]                                                                                                                                                
jayunit100 commented 2 years ago

Well i realized why my diffs were empty locally, i wasnt exporting CLUSTERGEN_BASE :).

-> % export CLUSTERGEN_BASE=504334e2cc1fdd9674036eb4a983034a3f62ac3a
ubuntu@jay-buildbox-6 [01:09:55] [~/SOURCE/tanzu-framework] [eb9b85f5 *]
-> % make clustergen

now i can reproduce this .

@hxietkg is finishing the investigation.

jayunit100 commented 2 years ago

investigation done, fixed in #990

in windows, we have multiple kubeadmconfigtemplates !