kubernetes / kubernetes

Production-Grade Container Scheduling and Management
https://kubernetes.io
Apache License 2.0
111.5k stars 39.77k forks source link

Test coverage of volume relabeling is lacking #86080

Open liggitt opened 4 years ago

liggitt commented 4 years ago

What happened: SELinux volume relabeling regressed in 1.16/1.17 with no test failures. See https://github.com/kubernetes/kubernetes/issues/83679

What you expected to happen: Test failures would have prevented the regression. Currently, we apparently only have manual test guarantees that this functions correctly.

/sig node storage /priority critical-urgent

liggitt commented 4 years ago

it appears we have a test for this but it is labeled Flaky and is not release blocking

    ginkgo.It("should support volume SELinux relabeling [Flaky] [LinuxOnly]", func() {
        testPodSELinuxLabeling(f, false, false)
    })
liggitt commented 4 years ago

in the place where we are running this test, it appears to be failing 100% of the time:

https://testgrid.k8s.io/google-gce#gci-gce-flaky&sort-by-flakiness=&width=20&include-filter-by-regex=SELinux

liggitt commented 4 years ago

/assign @gnufied

liggitt commented 4 years ago

@gnufied https://testgrid.k8s.io/google-gce#gci-gce-flaky&sort-by-flakiness=&width=20&include-filter-by-regex=SELinux looks solidly green now, can the [Flaky] tag be removed to ensure test coverage now?

gnufied commented 4 years ago

@liggitt flaky can definitely be removed from these tests, but looking at the logs:

 543   │ I0312 00:28:29.269] Mar 12 00:28:29.269: INFO: Running '/workspace/kubernetes/platforms/linux/amd64/kubectl exec --namespace=securit
       │ y-context-9753 security-context-21e88fa6-71c5-4175-8aa4-96bfe0289fce -c=test-container -- cat /sys/fs/selinux/enforce'
 544   │ I0312 00:28:29.918] Mar 12 00:28:29.917: INFO: error running kubectl exec to read file: exit status 1
 545   │ I0312 00:28:29.918] stdout=
 546   │ I0312 00:28:29.918] stderr=cat: can't open '/sys/fs/selinux/enforce': No such file or directory
 547   │ I0312 00:28:29.918] command terminated with exit code 1
 548   │ I0312 00:28:29.918] )

They are just skipping the bits that depend on selinux and hence is green. Do we have different configuration where selinux is enabled?

liggitt commented 4 years ago

… so we still don't actually have coverage of the feature anywhere that would be visible to the release team?

fejta-bot commented 4 years ago

Issues go stale after 90d of inactivity. Mark the issue as fresh with /remove-lifecycle stale. Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale

fejta-bot commented 4 years ago

Stale issues rot after 30d of inactivity. Mark the issue as fresh with /remove-lifecycle rotten. Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten

matthyx commented 3 years ago

/area test /triage accepted /help

k8s-ci-robot commented 3 years ago

@matthyx: This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed by commenting with the /remove-help command.

In response to [this](https://github.com/kubernetes/kubernetes/issues/86080): >/area test >/triage accepted >/help Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.
matthyx commented 3 years ago

@gnufied friendly reminder, are you still planning to work on this? Otherwise please unassign yourself.

SergeyKanzhelev commented 3 years ago

/unassign @gnufied

SergeyKanzhelev commented 3 years ago

/remove-priority critical-urgent /priority backlog

SergeyKanzhelev commented 3 years ago

/assign @haircommander

haircommander commented 3 years ago

looking at the test case, it still seems to be flaky, though it's not clear why: https://testgrid.k8s.io/google-gce#gci-gce-flaky&sort-by-flakiness=&width=20&include-filter-by-regex=SELinux

A container that is supposed to sleep 6000 seems to run to completion:

    <*errors.errorString | 0xc000400b00>: {
        s: "pod ran to completion",
    }

which is.. odd.

I would otherwise propose we drop the [Flaky] label, and then they'd be running in the CRI-O e2e tests, which actually do run with SELinux, and will be able to catch regressions

SergeyKanzhelev commented 1 year ago

@haircommander are you still working on this or needs to be un-assigned?

haircommander commented 1 year ago

I've been wanting to work on it, but honestly I don't think I have the capacity right now. I will unassign and maybe come back to it (or someone else can finish)

gnufied commented 1 year ago

I think this is being fixed via - https://github.com/kubernetes/kubernetes/pull/113789 and corresponding PRs.

jsafrane commented 1 year ago

While https://github.com/kubernetes/kubernetes/pull/113789 improves the situation, it does not test any relabeling done by the container runtime.

On the bright side, there is a CI job that has SELinux enforcing + containerd: https://testgrid.k8s.io/google-aws#kops-aws-selinux. It should be relatively easy to add tests that runs various pods and checks how the pods run in the end + how their volumes are relabeled.

ggold7046 commented 1 year ago

Hi, I'm looking for some testing issues. Could anyone guide me how to find it ?

matthyx commented 1 year ago

Hi, I'm looking for some testing issues. Could anyone guide me how to find it ?

https://testgrid.k8s.io/ lists all tests and their results over time

ggold7046 commented 1 year ago

Thanks for the reply @matthyx. Actually I want to get some software testing experience and I'm looking for the issues which will teach me software testing. Could you please guide me?

matthyx commented 1 year ago

I suggest you join our weekly meeting dedicated to Node CI: https://docs.google.com/document/d/1fb-ugvgdSVIkkuJ388_nhp2pBTy_4HEVg5848Xy7n5U/edit?usp=sharing

There you will find a small group of very friendly people taking care of the node tests as well as bug triage.

k8s-triage-robot commented 6 months ago

This issue has not been updated in over 1 year, and should be re-triaged.

You can:

For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/

/remove-triage accepted

AnishShah commented 3 months ago

/cc @AnishShah