hubblo-org / scaphandre

⚡ Energy consumption metrology agent. Let "scaph" dive and bring back the metrics that will help you make your systems and applications more sustainable !
Apache License 2.0
1.63k stars 109 forks source link

Include containerd-specific labels to data coming from powercaprapl sensor #84

Closed bpetit closed 3 years ago

bpetit commented 3 years ago

Problem

Making dashboard for kubernetes and docker-compose use cases may be a bit tedious as we can only filter processes by exe or cmdline to get the power consumption of a project or a given application (when composed of multiple containers).

Solution

We could discuss with containerd to get information about the containers running and label them according to projects or docker-compose services.

Alternatives

Using exclusively containerd to do that would be great as it is the core of both docker and kubernetes. However, we may have to discuss with the docker daemon itself or kubernetes API to actually get important information. Let's see how it goes.

Example: docker-compose-service=my_super_project or helm-app=my_awesome_project would be interessting. This FR is actually an exploratory one and may lead to more FRs to precise each use case. Please feel free to jump on it and talk about the use case you'd be the more interested in.

rossf7 commented 3 years ago

Hi @bpetit, I've been investigating this a bit. I'm new to containerd so I may be missing something but I think there is an issue with how to connect.

Do we need to mount the containerd / docker socket to access it? There can be security problems with that. As you can get access to the host.

For kubernetes an in-cluster connection restricted with RBAC may work better. We should only need read permission for pods and possibly nodes. The downside is we can't reuse code with docker-compose and we'd need a k8s client as a dependency.

But if we need to mount the socket for docker-compose then maybe a first iteration that also supports k8s makes sense. WDYT?

bpetit commented 3 years ago

I think you're right. Initially I was thinking we could limit actions scaphandre can take on the containerd socket, as we have to compile proto files and could choose what to compile or not. But it was highly speculative and asking to mount containerd socket would look suspicious anyway (even if we dont embed write functions in the protobuf). I'm not even sure what I was thinking about would work anyway.

I'll dig in the data we can get from kubernetes then.

But if we need to mount the socket for docker-compose then maybe a first iteration that also supports k8s makes sense. WDYT?

That's the thing, for a more "simple" usecase, like scaphandre running in a docker container, on a clusterless bare metal machine, I think we would still need to mount either the docker socket or the containerd socket 🤔 ...

There are also projects and people talking about proxying access to either docker or containerd socket : https://github.com/fluencelabs/docker-socket-proxy But is seems a bit heavy to me to ask to install a proxy in addition to scaphandre...

I'm interested in your thoughts about that. Not an obvious topic.

rossf7 commented 3 years ago

I'll dig in the data we can get from kubernetes then.

From a security POV I think accessing via the kubernetes API is better but I'm a bit worried how hard it will be to map the metrics to pods. Whereas the docker containers have labels for the pod and namespace.

So now I'm doubting my first reply. 😅 You're right this is not an obvious topic!

That's the thing, for a more "simple" usecase, like scaphandre running in a docker container, on a clusterless bare metal machine, I think we would still need to mount either the docker socket or the containerd socket

Yes I think it's important to target this use case too. I agree I think we'll need to mount the socket. The socket proxy is an interesting approach and I haven't seen it before.

I think most people will be OK with trying out scaphandre by mounting the socket. But in the docs we could describe the more advanced use case with the proxy. WDYT?

rossf7 commented 3 years ago

I wanted to look into this some more and I think there is a solution based on https://github.com/heptiolabs/pid2pod

We can get the container ID from /proc/$PID/cgroup and from that find the pod by checking its container status. The container ID should also help for the containerd integration.

Here is an example.

scaph_process_power_consumption_microwatts{app_kubernetes_io_managed_by="Helm", app_kubernetes_io_name="scaphandre", cmdline="/usr/local/bin/scaphandreprometheus", exe="scaphandre", instance="192.168.101.130:8080", job="kubernetes-service-endpoints", kubernetes_name="scaphandre", kubernetes_namespace="default", kubernetes_node="da11-c3.small.x86-01", pid="14765"}

For k8s any of the entries in the cgroup file with /kubepods has the container ID.

cat /proc/14765/cgroup

1:name=systemd:/kubepods/burstable/pod73bcd979-be50-42db-ba9c-5c2076dfdf6a/7a80b721c1cfed28abee8a94324924be51390aadc066bfd1baa8f5f4a4c25042

For docker containers the prefix is /docker.

cat /proc/428478/cgroup

1:name=systemd:/docker/d83478f1d16a95c921f343ace65529a8969876a86d6150f251084ee823e3d428

For k8s pods the container ID is docker://[container ID]

kubectl get pod scaphandre-4v5n8 -o yaml

apiVersion: v1
kind: Pod
metadata:
  name: scaphandre-4v5n8
  namespace: default
status:
  containerStatuses:
  - containerID: docker://7a80b721c1cfed28abee8a94324924be51390aadc066bfd1baa8f5f4a4c25042
    name: scaphandre

I think we could use it to set these labels on the metrics and maybe some others based on the pod labels.

container_name=scaphandre
kubernetes_namespace=default
kubernetes_pod=scaphandre-4v5n8

To enable the Kubernetes integration I think a --kubernetes-node param would be useful. We can get the node name via an env var from the Downward API. So the exporter only needs to list the pods on the same node.

Hopefully this is useful and I'm interested to hear if you think it would work.

bpetit commented 3 years ago

It seems pretty accurate, thanks for that tip ! As we already ask to mount /proc anyway it doesn't require any extra volume, this is great !

bpetit commented 3 years ago

I've something working in the pr #109 for docker containers names as labels.

I get the container ID from procfs as you suggested, then ask the names of the container through the docker socket.

I'm thinking about how to implement the same (getting the name of the pod once I have the container ID) for kubernetes and I was wondering: is this not a too big constraint to ask to have access to a kubeconfig file to get the informations ? I mean deploying a kubeconfig with the agent seems to add some complexity... What would be the best way to use some local privileges if we know we are running on a machine that hosts the kubelet and the apis, without using a kubeconfig file ? (open thoughts there and maybe dumb ones as I need to look at the documentation to get my memories back)

rossf7 commented 3 years ago

This is great! I'm glad that getting the container ID worked out.

What would be the best way to use some local privileges if we know we are running on a machine that hosts the kubelet and the apis, without using a kubeconfig file

I've been looking at the clux/kube-rs client and just using let client = Client::try_default().await? should work. See the example in https://docs.rs/kube/0.52.0/kube/

It first tries with an "in cluster" configuration which uses env vars that are injected by the kubelet. If not it falls back to using a kubeconfig.

The credentials added with the env vars are for the service account in the helm chart. So it will just need the new permissions to get pods added to the RBAC rules. https://github.com/hubblo-org/scaphandre/blob/v0.3.0/helm/scaphandre/templates/rbac.yaml#L22

I've been learning some Rust and would like to help with integrating this. Based on your changes in https://github.com/hubblo-org/scaphandre/pull/109

Would that be OK? I have some time this weekend to give it a go.

bpetit commented 3 years ago

It first tries with an "in cluster" configuration which uses env vars that are injected by the kubelet. If not it falls back to using a kubeconfig.

So I guess we have to mount an additional volume when scaphandre runs in a container right ?

I've been learning some Rust and would like to help with integrating this. Based on your changes in #109

Would that be OK? I have some time this weekend to give it a go.

Of course, feel free to create a new PR based on this one !

rossf7 commented 3 years ago

So I guess we have to mount an additional volume when scaphandre runs in a container right ?

Yes, a volume gets mounted at /var/run/secrets/kubernetes.io/serviceaccount but this happens automatically if you specify a service account in the daemonset or deployment.

https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/#serviceaccount-admission-controller

I'll let you know how I get on with adding the k8s client.

bpetit commented 3 years ago

Hi @rossf7 !

I got some time to work on the k8s part. Did you start something ? Could we merge efforts ? I'm looking at the elements you wrote on this thread and see how we could do it. Feel free to interrupt me and share what you've done or discovered. Thanks !

rossf7 commented 3 years ago

Hi @bpetit that's great you have some time to look at the k8s part :)

I've been doing some research but sadly I haven't made progress with code. I'll write it up here.

Since scaph is running as a daemonset like I mentioned before my idea was to have a --kubernetes-node flag. We can set this via an env var using the Downward API. A --kubeconfig flag could also be useful for dev but I think we can default to using the "in cluster" credentials.

I like a lot how you're using Docker Events and I think we can do something similar. So we list the pods on the node on startup and then watch for new pods to update the cache.

I got stuck with the dependencies. We'll need this crate https://docs.rs/k8s-openapi/0.12.0/k8s_openapi/index.html because it has the bindings to the k8s API. kube-rs has a nice example for watching pods https://github.com/clux/kube-rs/blob/master/examples/pod_watcher.rs but its quite heavyweight like Bollard was for Docker.

If you could add the k8s dependencies I can help with testing and the helm chart changes. Maybe I can have a go at the Rust changes for listing and watching pods but you will most likely be quicker. WDYT?

bpetit commented 3 years ago

Hi ! Great, thank you !

I'll give a shot at k8s-openapi, with isahc for the requests (as it's done in rs-docker-sync) and see if it requires a lot of code or not. This would be very minimalistic and maybe a bit more manageable than needing an async runtime to query k8s (it's a bit the same topic as for docker as you mentioned it).

If it's not that good we could consider and higher level crate, but a lot of them seems to not be maintained anymore. clux/kube-rs seems in a good shape but seems to require an async runtime even for simple queries...

rossf7 commented 3 years ago

Yes, using k8s-openapi with isahc sounds great. A full k8s client for just listing and watching pods is probably overkill.

Let me know how it goes and I'll try and have a go at the helm chart changes. I think this feature will be really useful and I'd like to help where I can.

rossf7 commented 3 years ago

Hi @bpetit, an update on this.

I've managed to get k8s-openapi working with isahc using in-cluster credentials. I worked on it in a separate repo to keep thing simple and because I'm on mac and usually use kind (kubernetes in docker) for local dev.

The code is here. It's still a bit rough but it's working. I'd like to try and integrate it into scaphandre but I have vacation next week. So if the code is useful and you have time to work on this feel free to use it. Thanks!

https://github.com/rossf7/rust-k8s-test/blob/main/src/main.rs

bpetit commented 3 years ago

Thanks a lot to you, I'll have a look at it ! :)

EDIT: I wondered why I didn't have the /var/run/secrets/kubernetes folder but I guess its specific to kind. I'll try to find if I can get the token that way from a kube node installed with kubespray (my test env).

I first tried to extract the auth data from /$USER/.kube/config but it requires a bit more logic and use serde to get the data. Not a real issue but if there is a simplistic way to do it with a simple token file I'd be enough.

rossf7 commented 3 years ago

I wondered why I didn't have the /var/run/secrets/kubernetes folder but I guess its specific to kind. I'll try to find if I can get the token that way from a kube node installed with kubespray (my test env).

It shouldn't be specific to kind but I think to have the /var/run/secrets/kubernetes folder the code needs to be running in a pod with a service account configured.

Can you try running it with the commands here? https://github.com/rossf7/rust-k8s-test#run-example

I think it would be useful to support both in-cluster and a kubeconfig file for local dev. There is an example of doing that here. https://github.com/ynqa/kubernetes-rust/blob/master/src/config/kube_config.rs