kubernetes-sigs / kind

Kubernetes IN Docker - local clusters for testing Kubernetes
https://kind.sigs.k8s.io/
Apache License 2.0
13.01k stars 1.51k forks source link

kubelet does not export metric `kubelet_volume_stats_capacity_bytes` #3643

Open irizzant opened 1 month ago

irizzant commented 1 month ago

What happened:

What you expected to happen:

Kubelet should export metric kubelet_volume_stats_capacity_bytes but it looks like it doesn't.

I deployed kind with the following configuration:

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-labels: "ingress-ready=true"
  - |
    kind: ClusterConfiguration
    controllerManager:
      extraArgs:
        bind-address: 0.0.0.0
    etcd:
      local:
        extraArgs:
          listen-metrics-urls: http://0.0.0.0:2381
    scheduler:
      extraArgs:
        bind-address: 0.0.0.0
  - |
    kind: KubeProxyConfiguration
    metricsBindAddress: 0.0.0.0
  extraMounts:
  - containerPath: /var/lib/kubelet/config.json
    hostPath: "$HOME/.docker/config.json"
  extraPortMappings:
    - containerPort: 443
      hostPort: 443
    - containerPort: 80
      hostPort: 80
- role: worker
- role: worker
- role: worker

I checked with Prometheus and I see many other metrics exported by kubelet but not this, e.g. immagine

immagine

How to reproduce it (as minimally and precisely as possible):

  1. deploy kind
  2. scrape kubelet metrics

Anything else we need to know?:

Environment:

Server: Containers: 4 Running: 4 Paused: 0 Stopped: 0 Images: 5 Server Version: 26.1.3 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 runc Default Runtime: runc Init Binary: docker-init containerd version: 8b3b7ca2e5ce38e8f31a34f35b2b68ceb8470d89 runc version: v1.1.12-0-g51d5e94 init version: de40ad0 Security Options: apparmor seccomp Profile: builtin cgroupns Kernel Version: 6.8.0-31-generic Operating System: Ubuntu 24.04 LTS OSType: linux Architecture: x86_64 CPUs: 12 Total Memory: 46.97GiB Name: ivan-desktop ID: dec72693-7f45-4497-a085-ecfcef4ea6fb Docker Root Dir: /var/lib/docker Debug Mode: false Username: xxx Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false



- OS (e.g. from `/etc/os-release`):
PRETTY_NAME="Ubuntu 24.04 LTS"
NAME="Ubuntu"
VERSION_ID="24.04"
VERSION="24.04 LTS (Noble Numbat)"
VERSION_CODENAME=noble
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=noble
LOGO=ubuntu-logo

- Kubernetes version: (use `kubectl version`): 
Client Version: v1.29.5
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: v1.30.0
- Any proxies or other special environment settings?:
BenTheElder commented 1 month ago

This is probably due to #3360

irizzant commented 1 month ago

Thanks for the updates, can you please detail how?

BenTheElder commented 1 month ago

A featuregate is enabled currently to disable local storage isolation, because enabling it broke kind on some hosts.

There's more discussion in the linked PR and transitively linked issues.

You can try this config to see if that PR would solve it: https://github.com/kubernetes-sigs/kind/pull/3360#issuecomment-2103232360

irizzant commented 4 weeks ago

Thank you again for the updates. I see localStorageCapacityIsolationoption graduated to GA in k8s v1.25 so it's enabled by default.

As far as I understand that option was disabled in kind because it broke kubernetes in some scenarios, but I still don't understand the connection with the missing metric because as far as I can see the option is about limiting the ephemeral storage in LimitRange and Pod specs.

Am I missing anything?

irizzant commented 4 weeks ago

I've just tried with this config

kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
  kubeadmConfigPatches:
  - |
    kind: KubeletConfiguration
    localStorageCapacityIsolation: true
  - |
    kind: InitConfiguration
    nodeRegistration:
      kubeletExtraArgs:
        node-labels: "ingress-ready=true"
  - |
    kind: ClusterConfiguration
    controllerManager:
      extraArgs:
        bind-address: 0.0.0.0
    etcd:
      local:
        extraArgs:
          listen-metrics-urls: http://0.0.0.0:2381
    scheduler:
      extraArgs:
        bind-address: 0.0.0.0
  - |
    kind: KubeProxyConfiguration
    metricsBindAddress: 0.0.0.0
  extraMounts:
  - containerPath: /var/lib/kubelet/config.json
    hostPath: "$HOME/.docker/config.json"
  extraPortMappings:
    - containerPort: 443
      hostPort: 443
    - containerPort: 80
      hostPort: 80
- role: worker
- role: worker
- role: worker

still no metric available!

BenTheElder commented 4 weeks ago

kind is shipping kubelet from upstream sources, I was guessing that this metric is gated behind the same code in kubelet because it's related to tracking filesystem stats. We're not modifying kubelet.

That config should be equivilant enough because kubeadm currently cluster-scopes kubelet config, but preferably it should be explicitly cluster scoped in case kind fixes this in the future (so kubeadmConfigPatches at the top level not under one of the nodes)

It would also be helpful to test this with a minimal configuration for reproducing purposes.

Note: I just looked up this metric and it's considered alpha, and related to PVs.

https://kubernetes.io/docs/reference/instrumentation/metrics/

The other thought is this may not work with PVs from https://github.com/rancher/local-path-provisioner

What's your use case?

irizzant commented 4 weeks ago

Here you have the reproducer, just run the script and port-forward Grafana in monitoring namespace to be able to see that the metric is missing. reproducer.zip

Using this reproducer I launched a kube-proxy and then issued this curl on a worker node: curl -X GET http://127.0.0.1:8001/api/v1/nodes/<node-name>/proxy/configz | jq .

to verify that the localStorageCapacityIsolation: truewas there and the metric still missing