rancher / qa-tasks

List of QA Backlog
1 stars 1 forks source link

Kubernetes Upgrade on RKE2 Fails with 'Invalid Semantic Version' Error #1122

Closed Priyashetty17 closed 6 months ago

Priyashetty17 commented 8 months ago

Yesterday, I ran TestUpgradeKubernetes on RKE1 and RKE2 clusters post upgrade checks for 2.6 security release. The k8s upgrade on the RKE1 cluster went through successfully, but the k8s upgrade on the RKE2 cluster failed with the "Invalid Semantic Version" error:

16:39:33.191 === RUN  TestKubernetesUpgradeTestSuite
 16:39:33.191 === RUN  TestKubernetesUpgradeTestSuite/TestUpgradeKubernetes
 16:39:33.191 === RUN  TestKubernetesUpgradeTestSuite/TestUpgradeKubernetes/auto-aws-mqriy
 16:39:33.191 kubernetes_test.go:93: [auto-aws-mqriy]: Provider is: rke2, Hosted: false, Imported: true , Local: false
 16:39:33.191 kubernetes_test.go:102: 
 16:39:33.191 Error Trace:        /root/go/src/github.com/rancher/rancher/tests/v2/validation/upgrade/kubernetes_test.go:102
 16:39:33.191 /root/go/src/github.com/rancher/rancher/tests/v2/validation/upgrade/kubernetes_test.go:73
 16:39:33.191 /root/go/pkg/mod/github.com/stretchr/testify@v1.8.4/suite/suite.go:112
 16:39:33.191 Error:             Received unexpected error:
 16:39:33.191 Invalid Semantic Version
 16:39:33.191 Test:              TestKubernetesUpgradeTestSuite/TestUpgradeKubernetes/auto-aws-mqriy
 16:39:33.191 --- FAIL: TestKubernetesUpgradeTestSuite (2.72s)
 16:39:33.191 --- FAIL: TestKubernetesUpgradeTestSuite/TestUpgradeKubernetes (1.48s)
 16:39:33.191 --- FAIL: TestKubernetesUpgradeTestSuite/TestUpgradeKubernetes/auto-aws-mqriy (1.48s)
 16:39:33.191 FAIL

Upon triaging, I see that the automation code (for RKE2) checks if the Rancher version is a valid semantic version here: https://github.com/rancher/rancher/blob/b57600cdc9b427f30f6c964ad46c535cce8f5c86/tests/framework/extensions/clusters/kubernetesversions/all.go#L49 Since the Rancher version under test is "v2.6.s3-c53e12ac72fa2cf01422770638f3600dc99ae22a-head," which is not a valid semantic version, it fails with an "Invalid Semantic Version" error.

cc @caliskanugur @sowmyav27

Priyashetty17 commented 8 months ago

@caliskanugur To address this issue, Is it a good idea to replace the following code in ListRKE2AllVersions:

    setting, err := client.Management.Setting.ByID(rancherVersionSetting)
    if err != nil {
        return
    }
    rancherVersion, err := semver.NewVersion(setting.Value)
    if err != nil {
        return
    }

with:

    endpointList, err := client.Steve.SteveType("endpoints").ByID("cattle-system/rancher")
    if err != nil {
        return
    }

    rancherChartVersion := strings.TrimPrefix(endpointList.ObjectMeta.Labels["chart"], "rancher-")

    rancherVersion, err := semver.NewVersion(rancherChartVersion)
    if err != nil {
        return
    }
sowmyav27 commented 8 months ago

Why are we checking if the rancher is a semantic version for this test ?

caliskanugur commented 8 months ago

The RKE2 and K3s versions are populated from these endpoints. And their responses return pool of RKE2/K3s versions.

We use rancher semver to mimic the UI version pools while filtering the versions, the underneath logic here. So we need rancher semantic version in this sort function.

But for the head rancher version setting, semver is invalid. We can support head versions by changing where we get the rancher version as @Priyashetty17 suggests.

We need to do a few sanity checks about this endpoints response and see if RCs and rancher server versions are shown and included as we need. We will need to trim that value and make a semver out of it for semver comparisons while filtering these pools.

Priyashetty17 commented 8 months ago

Update: The current implementation assumes that the endpoints v1-rke2-release/releases and v1-k3s-release/releases return ALL available versions of RKE2/K3s (both compatible and incompatible). And, to filter out the compatible versions, we have code that checks if the Rancher version falls within the specified range of minChannelVersion to maxChannelVersion. Max Ross confirmed that the endpoints return only the compatible versions. So, we can now remove this check. Extension 'kubernetesversions' will be updated to reflect this change.