kubernetes / node-problem-detector

This is a place for various problem detectors running on the Kubernetes nodes.
Apache License 2.0
2.83k stars 615 forks source link

Missing test coverage for standalone mode configuration #878

Closed wangzhen127 closed 4 months ago

wangzhen127 commented 4 months ago

NPD can be configured to run in either standalone or daemonset mode. In k/k, the NPD standalone mode can be configured and tested by configure.sh#688. However, as part of the kops support, the default mode has been switched from standalone mode to daemonset mode in PR https://github.com/kubernetes/kubernetes/pull/121007. The PR also bumped NPD version from v0.8.9 to v0.8.13.

On the other hand, NPD standalone mode test has always been relying on tar files in gs://kubernetes-release/node-problem-detector/ historically (See configure.sh#L29). And we only have NPD version up to v0.8.10 in the GCS bucket, due to historical release problems. This means we do not even have v0.8.13 tar files in the GCS. Given none of the k/k's release blocking tests fail, we lost the test coverage for standalone mode configuration already.

The release problem is being tracked by https://github.com/kubernetes/node-problem-detector/issues/874. We were trying to switch from using gs://kubernetes-release/ to github's own file hosting (PR https://github.com/kubernetes/kubernetes/pull/123741) and realized the missing test coverage.

CC @upodroid @SergeyKanzhelev @Random-Liu @BenTheElder

wangzhen127 commented 4 months ago

The missing test coverage is for k/k 1.29 and 1.30, which uses NPD v0.8.13. k/k 1.28 still uses NPD v0.8.9.

wangzhen127 commented 4 months ago

CC @hakman @vteratipally

wangzhen127 commented 4 months ago

CC @ndixita

upodroid commented 4 months ago

Potential solutions:

  1. Change the node e2e test harness to install NPD via standalone method and make the NPD test part of the node e2e tests instead of Kubernetes e2e tests.
  2. The Kubernetes e2e test installs NPD via a daemonset which is reasonable. It would work the same way on every Kubernetes cluster.
  3. ~Create a kubeup job that installs NPD via standalone mode.~ kubeup is deprecated and scheduled for deletion soon.
wangzhen127 commented 4 months ago

Thanks for the suggestions!

I looked into the existing NPD tests and realized that we are using standalone mode already. https://github.com/kubernetes/node-problem-detector/blob/master/test/build.sh#L79C8-L79C41

And we have several test jobs using it, including https://testgrid.k8s.io/sig-node-node-problem-detector#ci-npd-e2e-kubernetes-gce-gci.

So we are good here already.