kubernetes-sigs / cluster-api-provider-azure

Cluster API implementation for Microsoft Azure
https://capz.sigs.k8s.io/
Apache License 2.0
292 stars 421 forks source link

API version upgrade test failing with 'Provided Kubernetes version v1.22.9 does not have a corresponding VM image in the "capi offer"' #4422

Closed nojnhuh closed 8 months ago

nojnhuh commented 9 months ago

Which jobs are failing:

https://storage.googleapis.com/k8s-triage/index.html?text=Provided%20Kubernetes%20version%20v1.22.9%20does%20not%20have%20a%20corresponding%20VM%20image%20in%20the%20%22capi%20offer%22

e.g. https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-provider-azure-apiversion-upgrade-main/1739704293901996032

This 1.22.9 version is coming from here: https://github.com/kubernetes/test-infra/blob/90ea8fcf126f432ab2b8fcf806befbe1f1d78fb3/config/jobs/kubernetes-sigs/cluster-api-provider-azure/cluster-api-provider-azure-periodics-main.yaml#L142

Which tests are failing:

Since when has it been failing:

Testgrid link:

Reason for failure (if possible):

Anything else we need to know:

/kind failing-test

[One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels]

mboersma commented 9 months ago

v1.22.9 along with many other unsupported images were deprecated in the Marketplace offer on Dec. 26, 2023. I'll update the test-infra job.

mboersma commented 9 months ago

Unfortunately, this didn't fix things. The test gets farther but eventually fails with:

  STEP: Redacting sensitive information from logs @ 12/28/23 18:54:18.789
  INFO: "Should create a management cluster and then upgrade all the providers" ran for 52m37s on Ginkgo node 1 of 10 and reported junit test to file /logs/artifacts/test_e2e_junit.e2e_suite.1.xml
  << Timeline
  [FAILED] Timed out after 1500.000s.
  Timed out waiting for all machines to be exist
  Expected
      <int64>: 0
  to equal
      <int64>: 2
  In [It] at: /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.6.0/e2e/clusterctl_upgrade.go:429 @ 12/28/23 18:38:37.184
  Full Stack Trace
    sigs.k8s.io/cluster-api/test/e2e.ClusterctlUpgradeSpec.func2()
        /home/prow/go/pkg/mod/sigs.k8s.io/cluster-api/test@v1.6.0/e2e/clusterctl_upgrade.go:429 +0x2856
------------------------------

See here for an example in testgrid: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-provider-azure-apiversion-upgrade-main/1740429077291995136

nojnhuh commented 8 months ago

It looks like this is the blocking issue now:

The platform image 'cncf-upstream:capi:k8s-1dot23dot17-ubuntu-2004:latest' is not available.

from here: https://storage.googleapis.com/kubernetes-jenkins/logs/periodic-cluster-api-provider-azure-apiversion-upgrade-main/1741153860249980928/artifacts/clusters/clusterctl-upgrade-55v00b/logs/capz-system/capz-controller-manager/capz-controller-manager-64c46db7bb-7lr9n/manager.log

for this test run: https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-provider-azure-apiversion-upgrade-main/1741153860249980928

CecileRobertMichon commented 8 months ago

FWIW, the upgrade test uses an old version of k8s because CAPI 1.0 doesn't support newer versions

nojnhuh commented 8 months ago

@CecileRobertMichon Does CAPI 1.0 support Kubernetes 1.23?

CecileRobertMichon commented 8 months ago

It should, according to https://release-1-1.cluster-api.sigs.k8s.io/reference/versions, but I vaguely remember there was an issue with using 1.23 (something about a feature flag that changed between 1.22 and 1.23).

If it's getting to difficult to maintain the upgrade test from 1.0, we might want to change the test to upgrade from the oldest supported release instead.

mboersma commented 8 months ago

The error message that @nojnhuh posted seems to indicate that the CAPZ code was given an incorrect Marketplace URL, or that the code is so old it's unaware of the newer way to look up disk images. Kubernetes v1.23.17 is indeed published, but not at that location.

mboersma commented 8 months ago

If it's getting too difficult to maintain the upgrade test from 1.0

Aha! Since we've deprecated all the older k8s versions using the old publishing scheme, I don't think CAPZ 1.0 code can easily be tested any longer...

CecileRobertMichon commented 8 months ago

@mboersma @nojnhuh I can open a PR to change the upgrade test if neither of you is already on it.

nojnhuh commented 8 months ago

Please do!