kubernetes / test-infra

Test infrastructure for the Kubernetes project.
Apache License 2.0
3.83k stars 2.65k forks source link

[SIG Storage] Convert all jobs to Ginkgo --label-filter #33152

Open carlory opened 3 months ago

carlory commented 3 months ago

What would you like to be added:

Why is this needed:

Part of https://github.com/kubernetes/test-infra/issues/32911

/sig storage

carlory commented 3 months ago

job: pull-kubernetes-e2e-storage-kind-alpha-beta-features

- name: FEATURE_GATES
  value: '{"VolumeAttributesClass":true, "HonorPVReclaimPolicy": true, "CSIVolumeHealth": true}'
- name: RUNTIME_CONFIG
  value: '{"api/ga":"true", "storage.k8s.io/v1alpha1":"true", "storage.k8s.io/v1beta1":"true"}'
- name: FOCUS
  value: \[Feature:VolumeAttributesClass\]|\[Feature:HonorPVReclaimPolicy\]|\[Feature:CSIVolumeHealth\]
- name: PARALLEL
  value: "true"

expected (if focus on some features)

- name: FEATURE_GATES
  value: '{"VolumeAttributesClass":true, "HonorPVReclaimPolicy": true, "CSIVolumeHealth": true}'
- name: RUNTIME_CONFIG
  value: '{"api/ga":"true", "storage.k8s.io/v1alpha1":"true", "storage.k8s.io/v1beta1":"true"}'
- name: LABEL_FILTER
  value: '!Slow && !Disruptive && !Flaky && FeatureGate: containsAny {VolumeAttributesClass, HonorPVReclaimPolicy, CSIVolumeHealth}'
- name: PARALLEL
  value: "true"

Test on macos

(base) ➜  kubernetes git:(master) _output/bin/ginkgo --dry-run --label-filter='!Slow && !Disruptive && !Flaky && FeatureGate: containsAny {VolumeAttributesClass, HonorPVReclaimPolicy, CSIVolumeHealth}' -v ./test/e2e | grep SUCCESS
SUCCESS! -- 25 Passed | 0 Failed | 0 Pending | 6578 Skipped
carlory commented 3 months ago

job: pull-kubernetes-e2e-storage-kind-alpha-beta-features

- name: FEATURE_GATES
  value: '{"VolumeAttributesClass":true, "HonorPVReclaimPolicy": true, "CSIVolumeHealth": true}'
- name: RUNTIME_CONFIG
  value: '{"api/ga":"true", "storage.k8s.io/v1alpha1":"true", "storage.k8s.io/v1beta1":"true"}'
- name: FOCUS
  value: \[Feature:VolumeAttributesClass\]|\[Feature:HonorPVReclaimPolicy\]|\[Feature:CSIVolumeHealth\]
- name: PARALLEL
  value: "true"

expected (if focus on sig-storage without the feature label)

- name: FEATURE_GATES
  value: '{"VolumeAttributesClass":true, "HonorPVReclaimPolicy": true, "CSIVolumeHealth": true}'
- name: RUNTIME_CONFIG
  value: '{"api/ga":"true", "storage.k8s.io/v1alpha1":"true", "storage.k8s.io/v1beta1":"true"}'
- name: LABEL_FILTER
  value:  '!Slow && !Disruptive && !Flaky && (Feature: consistsOf Alpha || Feature: consistsOf Beta) && !(FeatureGate: isEmpty) && sig-storage'
- name: PARALLEL
  value: "true"

Test on macos with the master branch of k/k

(base) ➜  kubernetes git:(master) _output/bin/ginkgo --dry-run --label-filter='!Slow && !Disruptive && !Flaky && (Feature: consistsOf Alpha || Feature: consistsOf Beta) && !(FeatureGate: isEmpty) && sig-storage' -v ./test/e2e | grep SUCCESS
SUCCESS! -- 0 Passed | 0 Failed | 0 Pending | 6603 Skipped

0 Passed ? related-to https://github.com/kubernetes/kubernetes/pull/125452.

If we want to apply this change, the kind of sig-testing jobs must be updated first. cc @pohly @aojea

pohly commented 3 months ago

do we need to add a canary job before change it?

I don't think so. If there is one, use it, otherwise let's just modify the job directly.

Test on macos

A before/after comparison would be more useful than the absolute number. The goal is to run exactly the same tests as before.

If we want to apply this change, the kind of sig-testing jobs must be updated first.

Feature: consistsOf Alpha || Feature: consistsOf Beta doesn't make sense to me. It means "if the Feature set is either exactly Alpha or Beta", which is too restrictive.

carlory commented 3 months ago
_output/bin/ginkgo --dry-run --label-filter='!Slow && !Disruptive && !Flaky && !(FeatureGate: isEmpty) && sig-storage' -v ./test/e2e

------------------------------
[sig-storage] CSI Volumes [Driver: csi-hostpath] [Testpattern: Dynamic PV (ntfs)] [Feature:Windows] volume-modify [Feature:VolumeAttributesClass] [FeatureGate:VolumeAttributesClass] [Beta] should create a volume with VAC [sig-storage, Feature:Windows, Feature:VolumeAttributesClass, FeatureGate:VolumeAttributesClass, Feature:Beta]
/Users/kiki/workspace/golang/src/k8s.io/kubernetes/test/e2e/storage/testsuites/volume_modify.go:151
• [0.000 seconds]

------------------------------
[sig-storage] CSI Mock selinux on mount SELinuxMount [LinuxOnly] [Feature:SELinux] should pass SELinux mount option for RWOP volume and Pod with SELinux context set [FeatureGate:SELinuxMountReadWriteOncePod] [Beta] [sig-storage, Feature:SELinux, FeatureGate:SELinuxMountReadWriteOncePod, Feature:Beta]

Without Feature: consistsOf Alpha || Feature: consistsOf Beta, the above test cases will be selected. i.e. [Feature:Windows], [Feature:SELinux], and etc.

According to the strict definition of a feature label, it should depend on other extra requirements. If it's not skipped in their own test cases, the job may fail in kind cluster.

So if we can not use Feature: consistsOf Alpha || Feature: consistsOf Beta to filter out the test cases, I have to explicitly specify the feature-gate in the label-filter. i.e. https://github.com/kubernetes/test-infra/issues/33152#issuecomment-2258086770 @pohly

pohly commented 3 months ago

Let's first spell out what pull-kubernetes-e2e-storage-kind-alpha-beta-features is supposed to run: "all SIG Storage tests which are alpha or beta, are not flaky or slow, and which can run in a kind cluster where VolumeAttributesClass/HonorPVReclaimPolicy/CSIVolumeHealth are supported".

This leads me to:

sig-storage && Feature: containsAny {Alpha, Beta} && !Slow && !Flaky && Feature: isSubsetOf {VolumeAttributesClass, HonorPVReclaimPolicy, CSIVolumeHealth, Alpha, Beta}

One problem is that this includes variants of the tests for drivers other than [Driver: csi-hostpath], like [Driver: pd.csi.storage.gke.io]. It would be good to turn those inline tags into labels and exclude non-csi-hostpath drivers, then Ginkgo doesn't need to start tests that then skip themselves because the driver cannot be installed.

One thing that I noticed while looking at --dry-run output: why is this test marked as [Serial]?

[sig-storage] CSI Volumes [Driver: pd.csi.storage.gke.io] [Serial] [Testpattern: Dynamic PV (block volmode)] volume-modify [Feature:VolumeAttributesClass] [FeatureGate:VolumeAttributesClass] [Beta] should create a volume with VAC [sig-storage, Serial, Feature:VolumeAttributesClass, FeatureGate:VolumeAttributesClass, Feature:Beta]

If you have many of those, it can considerably slow down the job.

carlory commented 3 months ago

Feature: isSubsetOf {VolumeAttributesClass, HonorPVReclaimPolicy, CSIVolumeHealth, Alpha, Beta}

If so, we have to introduce a new feature label even if the feature only depends feature-gate and alpha/beta API group.

Do you mean it is sig-storage && Feature: containsAny {Alpha, Beta} && !Slow && !Flaky && FeatureGate: isSubsetOf {VolumeAttributesClass, HonorPVReclaimPolicy, CSIVolumeHealth}'?

If yes, it can be written as 'sig-storage && !Slow && !Flaky && FeatureGate: isSubsetOf {VolumeAttributesClass, HonorPVReclaimPolicy, CSIVolumeHealth}'. When we use framework.WithFeatureGate, it always adds a Feature label, alpha or beta, to the test. So, we can use FeatureGate to filter the test.

carlory commented 3 months ago

then Ginkgo doesn't need to start tests that then skip themselves because the driver cannot be installed.

Those tests will skip when it's running unless tests run on specific environment.

I'm not sure whether I should filter out those tests or not in job configuration.

carlory commented 3 months ago

why is this test marked as [Serial]? [sig-storage] CSI Volumes [Driver: pd.csi.storage.gke.io] [Serial] [Testpattern: Dynamic PV (block volmode)] volume-modify [Feature:VolumeAttributesClass] [FeatureGate:VolumeAttributesClass] [Beta] should create a volume with VAC [sig-storage, Serial, Feature:VolumeAttributesClass, FeatureGate:VolumeAttributesClass, Feature:Beta]

The gce-pd-csi-driver does not support custom provisioner name, it always uses pd.csi.storage.gke.io as the provisioner name. This is not configurable.

All in-tree integrations with cloud providers are removed. Should we remove related e2e tests from k/k. i.e.

https://github.com/kubernetes/kubernetes/blob/aab56e9b70b7d80f2a5a5f2907e172de257662b5/test/e2e/storage/drivers/csi.go#L830

cc @dims @jsafrane @msau42 @pohly

Should all jobs in sig-storage-gce-config.yaml be moved into sig-cloudprovider?

pohly commented 3 months ago

If so, we have to introduce a new feature label even if the feature only depends feature-gate and alpha/beta API group.

Yes. That is the current status.

Do you mean it is sig-storage && Feature: containsAny {Alpha, Beta} && !Slow && !Flaky && FeatureGate: isSubsetOf {VolumeAttributesClass, HonorPVReclaimPolicy, CSIVolumeHealth}?

That will work eventually, but right now a Feature: VolumeAttributesClass is required because some other job might not skip this test properly without such a Feature.

I'm not sure whether I should filter out those tests or not in job configuration.

You don't have to. Those e2eskipper.SkipUnlessProviderIs calls are intentionally executed before any costly per-test setup runs (like creating the per-test namespace). But I find skipping them in the job a bit nicer.

Should all jobs in sig-storage-gce-config.yaml be moved into sig-cloudprovider?

Probably? But I have to defer to the expert here.

k8s-triage-robot commented 4 days ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

carlory commented 3 days ago

/lifecycle frozen