Closed rzetelskik closed 6 months ago
/retest
@zimnx any idea why test-sanity could become flaky from just the dependency bump?
@zimnx any idea why test-sanity could become flaky from just the dependency bump?
I ran it a 100 times with github.com/kubernetes-csi/csi-test/v5 v5.2.0 and it passed every single time, but it failed on a 6th attempt with v5.1.0.
All tests passed...
This was attempt 5 of 101.
Running Suite: Sanity Suite - /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/test/sanity/set/localdriver
===========================================================================================================================
Random Seed: 1715185370
Will run 83 of 84 specs
••SSS
------------------------------
P [PENDING]
Local CSI Driver CSI sanity Controller Service [Controller Server] ListVolumes pagination should detect volumes added between pages and accept tokens when the last volume from a page is deleted
/home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/controller.go:268
------------------------------
•••••••SSSS•••••••SSSSSSSSSSSSSSSSSSSSSSSSSSSSS
------------------------------
• [FAILED] [60.053 seconds]
Local CSI Driver CSI sanity GroupController Service [GroupController VolumeGroupSnapshots] [BeforeEach] CreateVolumeGroupSnapshot should fail when no name is provided
[BeforeEach] /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/tests.go:46
[It] /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/groupcontroller.go:158
Timeline >>
STEP: connecting to CSI driver @ 05/08/24 18:22:51.088
[FAILED] in [BeforeEach] - /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/sanity.go:265 @ 05/08/24 18:23:51.131
[PANICKED] in [AfterEach] - /usr/local/go/src/runtime/panic.go:261 @ 05/08/24 18:23:51.132
<< Timeline
[FAILED] Unexpected error:
<*errors.errorString | 0xc000689620>:
Connection timed out
{
s: "Connection timed out",
}
occurred
In [BeforeEach] at: /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/sanity.go:265 @ 05/08/24 18:23:51.131
There were additional failures detected. To view them in detail run ginkgo -vv
------------------------------
SSSS•••••••••••SSSSSSSSSSSSS••
Summarizing 1 Failure:
[FAIL] Local CSI Driver CSI sanity GroupController Service [GroupController VolumeGroupSnapshots] [BeforeEach] CreateVolumeGroupSnapshot should fail when no name is provided
/home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/sanity.go:265
Ran 30 of 84 Specs in 60.860 seconds
FAIL! -- 29 Passed | 1 Failed | 1 Pending | 53 Skipped
--- FAIL: TestSanity (60.87s)
FAIL
Tests failed on attempt #6
Does this have a history of being flaky? Even with the version bump, I'm sceptical of merging this anyway now.
Should we maybe add repetitions in CI to observe this?
@rzetelskik: The following test failed, say
/retest
to rerun all failed tests or/retest-required
to rerun all mandatory failed tests:Test name Commit Details Required Rerun command ci/prow/e2e-gke-parallel 8df7c37 link true
/test e2e-gke-parallel
Full PR test history. Your PR dashboard.
• [FAILED] [60.367 seconds]
CSI upstream local.csi.scylladb.com [Testpattern: Dynamic PV (default fs)] capacity [It] provides storage capacity information
k8s.io/kubernetes@v1.29.4/test/e2e/storage/testsuites/capacity.go:103
Captured StdOut/StdErr Output >>
May 8 16:23:20.078: INFO: >>> kubeConfig: /tmp/kubeconfig-343746237
5 skipped lines...
May 8 16:23:20.078: INFO: >>> kubeConfig: /tmp/kubeconfig-343746237
STEP: Building a namespace api object, basename capacity @ 05/08/24 16:23:20.078
STEP: Waiting for a default service account to be provisioned in namespace @ 05/08/24 16:23:20.29
STEP: Waiting for kube-root-ca.crt to be provisioned in namespace @ 05/08/24 16:23:20.307
STEP: Creating a StorageClass @ 05/08/24 16:23:20.338
[FAILED] in [It] - k8s.io/kubernetes@v1.29.4/test/e2e/storage/testsuites/capacity.go:135 @ 05/08/24 16:24:20.424
May 8 16:24:20.424: INFO: deleting storage class local-csi-sc-p44bh
STEP: Destroying namespace "capacity-9036" for this suite. @ 05/08/24 16:24:20.434
<< Timeline
[FAILED] Timed out after 60.000s.
after creating storage class
no CSIStorageCapacity objects for storage class "local-csi-sc-p44bh"
all CSIStorageCapacity objects:
In [It] at: k8s.io/kubernetes@v1.29.4/test/e2e/storage/testsuites/capacity.go:135 @ 05/08/24 16:24:20.424
E2E failed as well. Same question - does this have a history of being flaky?
I ran it a 100 times with github.com/kubernetes-csi/csi-test/v5 v5.2.0 and it passed every single time, but it failed on a 6th attempt with v5.1.0.
Sanity failed locally with the version bump as well.
------------------------------
• [FAILED] [60.046 seconds]
Local CSI Driver CSI sanity ExpandVolume [Controller Server] [BeforeEach] should work
[BeforeEach] /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/tests.go:46
[It] /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/controller.go:1559
Timeline >>
STEP: connecting to CSI driver @ 05/08/24 18:36:31.076
[FAILED] in [BeforeEach] - /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/sanity.go:265 @ 05/08/24 18:37:31.103
<< Timeline
[FAILED] Unexpected error:
<*errors.errorString | 0xc000589860>:
Connection timed out
{
s: "Connection timed out",
}
occurred
In [BeforeEach] at: /home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/sanity.go:265 @ 05/08/24 18:37:31.103
------------------------------
SSSSSS•••••••••••SSSSSSSSSSSSS••
Summarizing 1 Failure:
[FAIL] Local CSI Driver CSI sanity ExpandVolume [Controller Server] [BeforeEach] should work
/home/rzetelskik/github.com/scylladb/k8s-local-volume-provisioner/vendor/github.com/kubernetes-csi/csi-test/v5/pkg/sanity/sanity.go:265
Ran 30 of 84 Specs in 60.958 seconds
FAIL! -- 29 Passed | 1 Failed | 1 Pending | 53 Skipped
--- FAIL: TestSanity (60.98s)
FAIL
Tests failed on attempt #290
@tnozicka should this bump dependencies to newest ones as well (including k8s v1.30.0) or just bump patch releases?
Sorry for accidental closing, I messed up rebasing in the process.
Sanity tests flake issue xref: https://github.com/scylladb/k8s-local-volume-provisioner/issues/49
patch release for kube is fine (minor is more pain to bump)
[APPROVALNOTIFIER] This PR is APPROVED
This pull-request has been approved by: rzetelskik, tnozicka
The full list of commands accepted by this bot can be found here.
The pull request process is described here
patch release for kube is fine (minor is more pain to bump)
I actually tried it and didn't run into any difficult issues other than the usual replacement bumps, but I'm happy to keep it as is to have it aligned with other repositories.
Description of your changes: This PR bumps go version to 1.22.3 and updates dependencies to latest patch releases to fix GO-2024-2824: A malformed DNS message in response to a query can cause the Lookup functions to get stuck in an infinite loop.
~It also updates the minor version of github.com/kubernetes-csi/csi-test/v5 to 5.2.0 - sanity tests were failing otherwise.~ Edit: reverted, they are flaky anyway.
Which issue is resolved by this Pull Request: Resolves https://github.com/scylladb/k8s-local-volume-provisioner/issues/47
/kind feature /priority critical-urgent /cc zimnx