Open liggitt opened 4 years ago
it appears we have a test for this but it is labeled Flaky and is not release blocking
ginkgo.It("should support volume SELinux relabeling [Flaky] [LinuxOnly]", func() {
testPodSELinuxLabeling(f, false, false)
})
in the place where we are running this test, it appears to be failing 100% of the time:
/assign @gnufied
@gnufied https://testgrid.k8s.io/google-gce#gci-gce-flaky&sort-by-flakiness=&width=20&include-filter-by-regex=SELinux looks solidly green now, can the [Flaky]
tag be removed to ensure test coverage now?
@liggitt flaky can definitely be removed from these tests, but looking at the logs:
543 │ I0312 00:28:29.269] Mar 12 00:28:29.269: INFO: Running '/workspace/kubernetes/platforms/linux/amd64/kubectl exec --namespace=securit
│ y-context-9753 security-context-21e88fa6-71c5-4175-8aa4-96bfe0289fce -c=test-container -- cat /sys/fs/selinux/enforce'
544 │ I0312 00:28:29.918] Mar 12 00:28:29.917: INFO: error running kubectl exec to read file: exit status 1
545 │ I0312 00:28:29.918] stdout=
546 │ I0312 00:28:29.918] stderr=cat: can't open '/sys/fs/selinux/enforce': No such file or directory
547 │ I0312 00:28:29.918] command terminated with exit code 1
548 │ I0312 00:28:29.918] )
They are just skipping the bits that depend on selinux and hence is green. Do we have different configuration where selinux is enabled?
… so we still don't actually have coverage of the feature anywhere that would be visible to the release team?
Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale
.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle stale
Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten
.
Rotten issues close after an additional 30d of inactivity.
If this issue is safe to close now please do so with /close
.
Send feedback to sig-testing, kubernetes/test-infra and/or fejta. /lifecycle rotten
/area test /triage accepted /help
@matthyx: This request has been marked as needing help from a contributor.
Please ensure the request meets the requirements listed here.
If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help
command.
@gnufied friendly reminder, are you still planning to work on this? Otherwise please unassign yourself.
/unassign @gnufied
/remove-priority critical-urgent /priority backlog
/assign @haircommander
looking at the test case, it still seems to be flaky, though it's not clear why: https://testgrid.k8s.io/google-gce#gci-gce-flaky&sort-by-flakiness=&width=20&include-filter-by-regex=SELinux
A container that is supposed to sleep 6000
seems to run to completion:
<*errors.errorString | 0xc000400b00>: {
s: "pod ran to completion",
}
which is.. odd.
I would otherwise propose we drop the [Flaky]
label, and then they'd be running in the CRI-O e2e tests, which actually do run with SELinux, and will be able to catch regressions
@haircommander are you still working on this or needs to be un-assigned?
I've been wanting to work on it, but honestly I don't think I have the capacity right now. I will unassign and maybe come back to it (or someone else can finish)
I think this is being fixed via - https://github.com/kubernetes/kubernetes/pull/113789 and corresponding PRs.
While https://github.com/kubernetes/kubernetes/pull/113789 improves the situation, it does not test any relabeling done by the container runtime.
On the bright side, there is a CI job that has SELinux enforcing + containerd: https://testgrid.k8s.io/google-aws#kops-aws-selinux. It should be relatively easy to add tests that runs various pods and checks how the pods run in the end + how their volumes are relabeled.
Hi, I'm looking for some testing issues. Could anyone guide me how to find it ?
Hi, I'm looking for some testing issues. Could anyone guide me how to find it ?
https://testgrid.k8s.io/ lists all tests and their results over time
Thanks for the reply @matthyx. Actually I want to get some software testing experience and I'm looking for the issues which will teach me software testing. Could you please guide me?
I suggest you join our weekly meeting dedicated to Node CI: https://docs.google.com/document/d/1fb-ugvgdSVIkkuJ388_nhp2pBTy_4HEVg5848Xy7n5U/edit?usp=sharing
There you will find a small group of very friendly people taking care of the node tests as well as bug triage.
This issue has not been updated in over 1 year, and should be re-triaged.
You can:
/triage accepted
(org members only)/close
For more details on the triage process, see https://www.kubernetes.dev/docs/guide/issue-triage/
/remove-triage accepted
/cc @AnishShah
What happened: SELinux volume relabeling regressed in 1.16/1.17 with no test failures. See https://github.com/kubernetes/kubernetes/issues/83679
What you expected to happen: Test failures would have prevented the regression. Currently, we apparently only have manual test guarantees that this functions correctly.
/sig node storage /priority critical-urgent