kubernetes / node-problem-detector

This is a place for various problem detectors running on the Kubernetes nodes.
Apache License 2.0
2.85k stars 616 forks source link

V0.8.15 image is missing log-counter binary #854

Closed plnordquist closed 3 months ago

plnordquist commented 6 months ago

When I attempt to use the node-problem-detector image at registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.15 with the /config/kernel-monitor-counter.json config file, it fails to start with the following log entry:

F0125 11:50:51.375607       1 custom_plugin_monitor.go:77] Failed to validate custom plugin config {Plugin:custom PluginGlobalConfig:{InvokeIntervalString:0xc00036b640 TimeoutString:0xc00036b650 InvokeInterval:5m0s Timeout:1m0s MaxOutputLength:0xc00059fca0 Concurrency:0xc00059fcb0 EnableMessageChangeBasedConditionUpdate:0x2d0a80e SkipInitialStatus:0x2d0a80f} Source:kernel-monitor DefaultConditions:[{Type:FrequentUnregisterNetDevice Status: Transition:0001-01-01 00:00:00 +0000 UTC Reason:NoFrequentUnregisterNetDevice Message:node is functioning properly}] Rules:[0xc0004a11f0] EnableMetricsReporting:0xc00059fcb8}: rule path "/home/kubernetes/bin/log-counter" does not exist. Rule: &{Type:permanent Condition:FrequentUnregisterNetDevice Reason:UnregisterNetDevice Path:/home/kubernetes/bin/log-counter Args:[--journald-source=kernel --log-path=/var/log/journal --lookback=20m --count=3 --pattern=unregister_netdevice: waiting for \w+ to become free. Usage count = \d+] TimeoutString:0xc00036b670 Timeout:1m0s}

I checked for the /home/kubernetes/bin/log-counter binary in the image and it does not exist. I checked the registry.k8s.io/node-problem-detector/node-problem-detector:v0.8.14 image and it exists in that image. I've deployed the node-problem-detector with helm via the Delivery Hero helm chart and I've included the versions and helm values below.

Environment

Kubernetes v1.26.12 Helm v3.13.3 Chart Repo: https://charts.deliveryhero.io/ Chart in repo: node-problem-detector Chart Version: 2.3.12 Chart values:

metrics:
  enabled: true
  serviceMonitor:
    enabled: true
rbac:
  pspEnabled: false
resources:
  limits:
    cpu: 250m
    memory: 128Mi
  requests:
    cpu: 100m
    memory: 128Mi
settings:
  custom_plugin_monitors:
  - /config/kernel-monitor-counter.json
  - /config/systemd-monitor-counter.json
  log_monitors:
  - /config/kernel-monitor.json
  - /config/systemd-monitor.json
wangzhen127 commented 3 months ago

/close

Please try the newer versions. Feel free to file new issues if the new version does not work.

k8s-ci-robot commented 3 months ago

@wangzhen127: Closing this issue.

In response to [this](https://github.com/kubernetes/node-problem-detector/issues/854#issuecomment-2040283827): >/close > >Please try the newer versions. Feel free to file new issues if the new version does not work. Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.