google / cadvisor

Analyzes resource usage and performance characteristics of running containers.
Other
17.19k stars 2.32k forks source link

Cadvisor Metrics Label contain empty Strings as values (container="", image="",...) #3336

Open Feederhigh5 opened 1 year ago

Feederhigh5 commented 1 year ago

The Issue

I want to analyze the resource consumption (CPU, Network, Memory, I/O) on container level. As far as I know, these metrics are generated by cadvisor and exposed at https://kubernetes.default.svc/api/v1/nodes/minikube/proxy/metrics/cadvisor.

I check them through the following command: kubectl get --raw /api/v1/nodes/minikube/proxy/metrics/cadvisor

My problem is, that the necessary container label (among others like image and name) are empty strings. That prevents me from analyzing the resource consumption per container. For example:

container_network_receive_bytes_total{container="",id="/",image="",interface="bridge",name="",namespace="",pod=""} 2.90072711e+08 1687270944453
container_memory_working_set_bytes{container="",id="/kubepods/pod49686167-c619-4f80-bfd7-5bacc76e2e47",image="",name="",namespace="logging",pod="kibana-kibana-749869bc5c-4dgmt"} 4.0378368e+08 1687269135477

My Setup

I am running Minikube on Windows (WSL2) with Docker as the driver.

Minikube

minikube version: v1.30.1 commit: 08896fd1dc362c097c925146c4a0d0dac715ace0

Docker

docker@minikube:~$ docker info
Client:
 Context:    default
 Debug Mode: false

Server:
 Containers: 187
  Running: 97
  Paused: 0
  Stopped: 90
 Images: 62
 Server Version: 23.0.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 2806fc1057397dbaeefbea0e4e17bddfbd388f38
 runc version: v1.1.5-0-gf19387a
 init version: de40ad0
 Security Options:
  seccomp
   Profile: builtin
 Kernel Version: 5.15.90.1-microsoft-standard-WSL2
 Operating System: Ubuntu 20.04.5 LTS (containerized)
 OSType: linux
 Architecture: x86_64
 CPUs: 12
 Total Memory: 31.35GiB
 Name: minikube
 ID: e1849438-1c04-43cd-9d9a-8bb562535842
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 No Proxy: control-plane.minikube.internal
 Registry: https://index.docker.io/v1/
 Labels:
  provider=docker
 Experimental: false
 Insecure Registries:
  10.96.0.0/12
  127.0.0.0/8
 Live Restore Enabled: false

Questions

How do I configure cadvisor (or check if cadvisor is correctly configured) in minikube? Do you have a suggestion how I could get these metrics?

Maybe related

possibly related issue: #3323, but as far as I know, I am not using cri-o. possibly related issue #5782, but I am not using k3s.


I would highly appreciate help! If you need any more info, please let me know. Thank you

Howie59 commented 1 year ago

i meet same problem, but my cri is containerd do u have any 2 solve it? https://github.com/google/cadvisor/issues/3323#issuecomment-1606542381

@Feederhigh5

Feederhigh5 commented 1 year ago

Hello @Howie59,

I tried a few things, maybe something is of help to you:

  1. At first, I manually deployed cadvisor to the cluster myself (using a helm chart makes this easier) and configured Prometheus to scrape data from it. This let me open up the cadvisor page and see memory and CPU usage, but the details and labels were strange. I think this is because of my setup. I'm using Windows with WSL, Docker Desktop, a containerized ubuntu, and minikube with a docker driver. I'm not sure, but I think Docker on minikube is using Window's Docker Desktop. I think this is the reason why the names for containers and pods are strange. Before I changed Prometheus and Grafana to work with these unsual labels, I tried something else because I still couldn't see network usage.

  2. I set up minikube using Hyper-V as the driver and containerd as the runtime. This made things less complicated. Now, without the complex docker setup, kubernetes' default cadvisor was already providing me with memory and cpu metrics + the correct labels. But, I could still only see network usage for the k8s pause containers, and these do not show real container traffic (as far as I know). Now, I have the same problem as issue #3323.

  3. Finally, as a workaround, I added a service mesh like linkerd. This adds small lightweight proxies to each pod which handle all traffic and also provide detailed metrics. A service mesh comes with a lot of overhead and it's a large endevour to introduce and configure it in a production system, but for my small use case it was sufficient.

I hope this helps a bit.

nathanmcgarvey-modopayments commented 1 year ago

Possibly related: https://github.com/docker/for-mac/issues/6969

maxin93 commented 1 year ago

I have a same problem with kubernetes 1.25.3, I found that the key point is: when cadvisor is start and init the data, the containerd is not valid, the func (f *containerdFactory) CanHandleAndAccept(name string) (bool, bool, error) in container/containerd/factory.go will failed to invoke containerd to get container info, so the handler for the data is handled by raw handler. The following code in container/factory.go:

func NewContainerHandler(name string, watchType watcher.ContainerWatchSource, metadataEnvAllowList []string, inHostNamespace bool) (ContainerHandler, bool, error) {
    factoriesLock.RLock()
    defer factoriesLock.RUnlock()

    // Create the ContainerHandler with the first factory that supports it.
    // Note that since RawContainerHandler can support a wide range of paths,
    // it's evaluated last just to make sure if any other ContainerHandler
    // can support it.
    for _, factory := range GetReorderedFactoryList(watchType) {
        canHandle, canAccept, err := factory.CanHandleAndAccept(name)
        if err != nil {
            klog.V(4).Infof("Error trying to work out if we can handle %s: %v", name, err)
        }
        if canHandle {
            if !canAccept {
                klog.V(3).Infof("Factory %q can handle container %q, but ignoring.", factory, name)
                return nil, false, nil
            }
            klog.V(3).Infof("Using factory %q for container %q", factory, name)
            handle, err := factory.NewContainerHandler(name, metadataEnvAllowList, inHostNamespace)
            return handle, canAccept, err
        }
        klog.V(4).Infof("Factory %q was unable to handle container %q", factory, name)
    }

    return nil, false, fmt.Errorf("no known factory can handle creation of container")
}

But, once the handler has been created, the container info (PodName, Namespace, Image, ContainerName) will not be refreshed in the future.

AlbertoSoutullo commented 1 year ago

@Feederhigh5 I am facing the same issue for quite some time already. Did you manage to solve it?

nicolas-laduguie commented 1 month ago

Hi, any fix ?

Icedcocon commented 3 weeks ago

Same problem, but I'm using Kubernetes v1.20.5.