scylladb / local-csi-driver

ScyllaDB local volume provisioner for Kubernetes based on CSI
Apache License 2.0
8 stars 6 forks source link

Bump go version to 1.22.3 and update dependencies #48

Closed rzetelskik closed 6 months ago

rzetelskik commented 6 months ago

Description of your changes: This PR bumps go version to 1.22.3 and updates dependencies to latest patch releases to fix GO-2024-2824: A malformed DNS message in response to a query can cause the Lookup functions to get stuck in an infinite loop.

~It also updates the minor version of github.com/kubernetes-csi/csi-test/v5 to 5.2.0 - sanity tests were failing otherwise.~ Edit: reverted, they are flaky anyway.

Which issue is resolved by this Pull Request: Resolves https://github.com/scylladb/k8s-local-volume-provisioner/issues/47

/kind feature /priority critical-urgent /cc zimnx

rzetelskik commented 6 months ago

/retest

rzetelskik commented 6 months ago

@zimnx any idea why test-sanity could become flaky from just the dependency bump?

rzetelskik commented 6 months ago

@zimnx any idea why test-sanity could become flaky from just the dependency bump?

I ran it a 100 times with github.com/kubernetes-csi/csi-test/v5 v5.2.0 and it passed every single time, but it failed on a 6th attempt with v5.1.0.

All tests passed...
This was attempt 5 of 101.
Running Suite: Sanity Suite - /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/test/sanity/set/localdriver
===========================================================================================================================
Random Seed: 1715185370

Will run 83 of 84 specs
••SSS
------------------------------
P [PENDING]
Local CSI Driver CSI sanity Controller Service [Controller Server] ListVolumes pagination should detect volumes added between pages and accept tokens when the last volume from a page is deleted
/home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/controller.go:268
------------------------------
•••••••SSSS•••••••SSSSSSSSSSSSSSSSSSSSSSSSSSSSS
------------------------------
• [FAILED] [60.053 seconds]
Local CSI Driver CSI sanity GroupController Service [GroupController VolumeGroupSnapshots] [BeforeEach] CreateVolumeGroupSnapshot should fail when no name is provided
  [BeforeEach] /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/tests.go:46
  [It] /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/groupcontroller.go:158

  Timeline >>
  STEP: connecting to CSI driver @ 05/08/24 18:22:51.088
  [FAILED] in [BeforeEach] - /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/sanity.go:265 @ 05/08/24 18:23:51.131
  [PANICKED] in [AfterEach] - /usr/local/go/src/runtime/panic.go:261 @ 05/08/24 18:23:51.132
  << Timeline

  [FAILED] Unexpected error:
      <*errors.errorString | 0xc000689620>:
      Connection timed out
      {
          s: "Connection timed out",
      }
  occurred
  In [BeforeEach] at: /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/sanity.go:265 @ 05/08/24 18:23:51.131

  There were additional failures detected.  To view them in detail run ginkgo -vv
------------------------------
SSSS•••••••••••SSSSSSSSSSSSS••

Summarizing 1 Failure:
  [FAIL] Local CSI Driver CSI sanity GroupController Service [GroupController VolumeGroupSnapshots] [BeforeEach] CreateVolumeGroupSnapshot should fail when no name is provided
  /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/sanity.go:265

Ran 30 of 84 Specs in 60.860 seconds
FAIL! -- 29 Passed | 1 Failed | 1 Pending | 53 Skipped
--- FAIL: TestSanity (60.87s)
FAIL

Tests failed on attempt #6

Does this have a history of being flaky? Even with the version bump, I'm sceptical of merging this anyway now.

Should we maybe add repetitions in CI to observe this?

rzetelskik commented 6 months ago

@rzetelskik: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command ci/prow/e2e-gke-parallel 8df7c37 link true /test e2e-gke-parallel Full PR test history. Your PR dashboard.

https://prow.scylla-operator.scylladb.com/view/gs/scylla-operator-prow/pr-logs/pull/scylladb_k8s-local-volume-provisioner/48/pull-k8s-local-volume-provisioner-master-e2e-gke-parallel/1788239914580578304#1:test-build-log.txt%3A2805

• [FAILED] [60.367 seconds]
CSI upstream local.csi.scylladb.com [Testpattern: Dynamic PV (default fs)] capacity [It] provides storage capacity information
k8s.io/kubernetes@v1.29.4/test/e2e/storage/testsuites/capacity.go:103
  Captured StdOut/StdErr Output >>
  May  8 16:23:20.078: INFO: >>> kubeConfig: /tmp/kubeconfig-343746237
5 skipped lines...
  May  8 16:23:20.078: INFO: >>> kubeConfig: /tmp/kubeconfig-343746237
  STEP: Building a namespace api object, basename capacity @ 05/08/24 16:23:20.078
  STEP: Waiting for a default service account to be provisioned in namespace @ 05/08/24 16:23:20.29
  STEP: Waiting for kube-root-ca.crt to be provisioned in namespace @ 05/08/24 16:23:20.307
  STEP: Creating a StorageClass @ 05/08/24 16:23:20.338
  [FAILED] in [It] - k8s.io/kubernetes@v1.29.4/test/e2e/storage/testsuites/capacity.go:135 @ 05/08/24 16:24:20.424
  May  8 16:24:20.424: INFO: deleting storage class local-csi-sc-p44bh
  STEP: Destroying namespace "capacity-9036" for this suite. @ 05/08/24 16:24:20.434
  << Timeline
  [FAILED] Timed out after 60.000s.
  after creating storage class
  no CSIStorageCapacity objects for storage class "local-csi-sc-p44bh"
  all CSIStorageCapacity objects:
  In [It] at: k8s.io/kubernetes@v1.29.4/test/e2e/storage/testsuites/capacity.go:135 @ 05/08/24 16:24:20.424

E2E failed as well. Same question - does this have a history of being flaky?

rzetelskik commented 6 months ago

I ran it a 100 times with github.com/kubernetes-csi/csi-test/v5 v5.2.0 and it passed every single time, but it failed on a 6th attempt with v5.1.0.

Sanity failed locally with the version bump as well.

------------------------------
• [FAILED] [60.046 seconds]
Local CSI Driver CSI sanity ExpandVolume [Controller Server] [BeforeEach] should work
  [BeforeEach] /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/tests.go:46
  [It] /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/controller.go:1559

  Timeline >>
  STEP: connecting to CSI driver @ 05/08/24 18:36:31.076
  [FAILED] in [BeforeEach] - /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/sanity.go:265 @ 05/08/24 18:37:31.103
  << Timeline

  [FAILED] Unexpected error:
      <*errors.errorString | 0xc000589860>:
      Connection timed out
      {
          s: "Connection timed out",
      }
  occurred
  In [BeforeEach] at: /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/sanity.go:265 @ 05/08/24 18:37:31.103
------------------------------
SSSSSS•••••••••••SSSSSSSSSSSSS••

Summarizing 1 Failure:
  [FAIL] Local CSI Driver CSI sanity ExpandVolume [Controller Server] [BeforeEach] should work
  /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/sanity.go:265

Ran 30 of 84 Specs in 60.958 seconds
FAIL! -- 29 Passed | 1 Failed | 1 Pending | 53 Skipped
--- FAIL: TestSanity (60.98s)
FAIL

Tests failed on attempt #290
rzetelskik commented 6 months ago

@tnozicka should this bump dependencies to newest ones as well (including k8s v1.30.0) or just bump patch releases?

rzetelskik commented 6 months ago

Sorry for accidental closing, I messed up rebasing in the process.

Sanity tests flake issue xref: https://github.com/scylladb/k8s-local-volume-provisioner/issues/49

tnozicka commented 6 months ago

patch release for kube is fine (minor is more pain to bump)

scylla-operator-bot[bot] commented 6 months ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: rzetelskik, tnozicka

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[OWNERS](https://github.com/scylladb/k8s-local-volume-provisioner/blob/master/OWNERS)~~ [tnozicka] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
rzetelskik commented 6 months ago

patch release for kube is fine (minor is more pain to bump)

I actually tried it and didn't run into any difficult issues other than the usual replacement bumps, but I'm happy to keep it as is to have it aligned with other repositories.