Add an example of usage with docker (and k8s)

jdrouet commented 3 years ago

Problem

I don't have access to the machines where my applications are running, they run in the "cloud", I can just deploy other containers

Solution

Deploy a container with the docker socket mounted to monitor the containers

Alternatives

No idea in mind

Additional context

It's nowaday's use case for production.

bpetit commented 3 years ago

Hi !

Thanks a lot for that FR. This is a deeper topic than it may sound. To get accurate power consumption metrics we need to have access to the RAPL interface provided by the powercap and intel_rapl modules. In short, we need to have access to the bare metal machine. Scaphandre already enables to get metrics of a virtual machine if the hypervisor is qemu/kvm and runs scaphandre too. This behavior is described here and the implementation here.

For a public cloud scenario, until the provider installs scaphandre on the bare metal hypervisors, you can't get access to those accurate metrics.

There is in the roadmap a feature request to implement a mode to estimate the power consumption, based on ressources (cpu/ram/gpu...) consumed on the VM and the hardware informations that the cloud provider may make available regarding the machines it uses for the different workloads.

Back to containers, measuring power consumption of a whole (physical) machine, thanks to scaphandre running in a container, should already work. We just need to mount the appropriate files in the container (/sys/class/powercap/...). I think about writing a documentation for that. It may require to mount some /proc files in the container too, to get data about other processes on the host, otherwise it may loose some interest of the tool. (this may also lead to some security concerns that may be investigated too)

A more advanced scenario could be to run scaphandre as a daemonset in a kubernetes cluster to measure metrics on the nodes. This is directly correlated to the first scenario. You are definitely right that we need to produce some documentation/walkthrough about that.

Deploy a container with the docker socket mounted to monitor the containers

Scaphandre will lack some logic for now to interpret stuff from the docker socket and provide metrics for each container. But it would be a perfect use case to extend the rapl sensor with a flag to enable grabbing data there too (or maybe a new sensor that doesn't rely on rapl, as this use case would fit in a lot of public cloud scenarii, cf the FR I linked above)

Let me know your thoughts about that, I'm thinking out loud as I write and may have gone too far in some directions and not enough in other directions :D

zwerdlds commented 3 years ago

Hi all, I am cold calling on the help-wanted label here. I have an initial workup of the docker and docker-compose solution in my forked repo. I am not yet submitting a PR because I do not think it fulfills the minimum requirements here, but I wanted to let you all review the initial pass and see where I should go from here. Some notes:

I am simply copying the compiled binary into the system. If we want to compile inside the dockerfile instead, let me know and I will make the change.
The docker-compose assumes a local build. If we want to host on the Hub someone in hubblo-org should set that up.
The program is launched in prometheus mode, as I wasn't sure if there was a need for the stdout or other modes.

You can view the code here: https://github.com/zwerdlds/scaphandre/tree/docker Please let me know if I am violating any spoken or unspoken rules - I am just starting to contribute.

bpetit commented 3 years ago

This is great, I'll have a look a that so that we can discuss it. Thanks a lot for the initial work !

bpetit commented 3 years ago

Hi ! I jump back to your contribution after a family vacation week !

I didn't manage to make it work with the alpine based image. It seems this is because rust executables are dynamically linked:

$ ldd target/release/scaphandre linux-vdso.so.1 (0x00007ffc79b0b000) libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f786f02e000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f786ee3c000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f786eced000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f786ecca000) libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f786ecc4000) /lib64/ld-linux-x86-64.so.2 (0x00007f786f450000)

So you need those libraries in the container image. I changed the dockerfile base image to rust:latest which includes those libraries as far as I can tell, and it works (or at least it runs and doesn't crash, but I didn't check the metrics).

The dockerfile that works for me to test a pre compiled version of scaphandre is:

 FROM rust:latest

COPY ./target/release/scaphandre /usr/bin/scaphandre

CMD ["scaphandre", "prometheus"]

Another option seems to build the project as static binary with: rust-musl-builder but I read that it can cause some weird issues.

Going forward, I think a very useful scenario would be to have a dockerfile that allows us to build the image when a new satisfying version of the code is pushed or merged to master. So we would need a simple COPY . . to get the project in the image fs and then run the compilation. (I saw several blog posts about compiling rust projects in docker containers and it seems there are some tricks to be activated to cut the building time. Here is one)

With the powercaprapl sensor being the only sensor available yet, this will still require running the container on a bare metal machine and mount /sys/class/powercap as a volume (as well as a subset of /proc). Despite those constraints, having a docker image ready to run would be also useful to quickly test scaphandre on a host, without having to compile the project.

I'm wondering about the docker-compose file. Did you have a use case scenario in mind in particular for it ? I imagine this would be great to easily build a testbed for scaphandre (having grafana and prometheus running on the host as well as scaphandre. And thus easily get access to dashboards in grafana). I'm interested about what other use case you may have in mind. ?

zwerdlds commented 3 years ago

@bpetit yeah, alpine has always been a pain for me with rust - I guess I was was trying too hard to be hip. sorry about that. rust:latest makes more sense.

I'll make the mentioned change wrt building in-container. I don't usually worry about the docker build times, and I have no experience with chef so I'll get a poc up and we can go from there. I've done some other stuff to reduce dependency compilation times on some of my personal images so I'll see if I can weave those in somehow.

The docker-compose was just intended as a quick-start point... I do think itll be good to put together a prometheus stack, if thats how your users are typically using this project. That might be a little iffy from me but I'll do some checking on prometheus and see if I can rig it up. I might need some feedback on env args or other cfg.

I'll be in touch soon with these changes.

rossf7 commented 3 years ago

Hi @bpetit & @zwerdlds, I've been wanting to try this out since @mrchrisadams told me about Scaphandre in the climateaction.tech slack.

I'm new to Rust but I've got a multi stage Docker build here. https://github.com/giantswarm/scaphandre/blob/helm-chart/Dockerfile It uses cargo to build the image and ubuntu:20.04 for the final image which is 138Mb uncompressed so not too big.

I initially tried using debian:buster-slim but the security scanner for our container registry (Quay) found a vuln with glibc. TIL that the official Ubuntu docker images are designed to be minimal and it passed the scanner. https://ubuntu.com/blog/minimal-ubuntu-released

The Prometheus and QEMU support is awesome as we use both heavily at Giant Swarm. I tried the image on one of our Kubernetes test clusters but the kernel is too old. We use Flatcar Linux https://www.flatcar-linux.org/ and they support newer kernels but we need to re-install one of the physical nodes. One of our SREs is kindly going to help me with that. As I don't work with our onprem clusters that much.

We use Helm so there is a WIP chart here https://github.com/giantswarm/scaphandre/tree/helm-chart/helm/scaphandre that will deploy scaphandre as a daemonset. I can also try and do a simpler example with just YAML manifests.

I hope the Dockerfile will be useful and I'll let you know when we've tested with K8s and the Helm chart.

bpetit commented 3 years ago

This is great. I'm about to merge the "new" documentation and I'd like to include what's related to run scaphandre in a container and on kubernetes if possible. So I'll wait a bit to merge that we end up with something satisfying here.

I'll try the image and jump back to the discussion. Thanks a lot for sharing this work.

bpetit commented 3 years ago

I built and ran your image successfully with the following command line:

docker build -t scaphandre .
docker run -p 8042:8080 -v /sys/class/powercap:/sys/class/powercap -v /proc:/proc --name scaphandre -tid scaphandre prometheus

I just preferred the 8042 port number to not collide with my local un-contained scaphandre version. I added volumes for the required files and everything seems ok, I'll just wait a bit to see if the metrics seems accurate on the long run.

Mounting /proc in the container seems a bit careless though, but I don't know how we could do that more cleanly (we need actually /proc/stats and all the /proc/PID/stats files, is there kind of a "wildcard" mountpoint feature in docker ?)

I look forward to read your thoughts about that.

@zwerdlds and @rossf7 maybe you could team up to open a PR containing the dockerfile ? I'll try to integrate the building of the image and some tests around that in a CI pipeline next week.

I'm really thrilled about the helm chart too. I'll wait for your tests feedbacks on the kubernetes cluster before moving forward on that topic.

Thanks a lot folks, all this work is great !

bpetit commented 3 years ago

It seems alright:

2021-01-07_16-59

rossf7 commented 3 years ago

Hi @bpetit, we didn't get a chance to re-install the node this week. But I had some time today and I found a different way to test.

I'm using Equinix Metal which I've used before back when they were Packet.net. Their c3.small.x86 servers can be rented for 50 cents an hour, have REPL support and are working for testing.

I've got the Helm chart working but I want to improve the permissions. I'm using the same mounts as in your example and running the pods as privileged so they run as root.

Instead of running as root the permissions can be restricted via the Pod Security Policy but I'll get some help with that as I don't work with PSPs much and they can be tricky.

But the Dockerfile was fine in my tests too. So I'll open a PR for that. @zwerdlds I hope that's OK with you and you don't mind if I add you as a reviewer?

bpetit commented 3 years ago

Dockerfile PR is merged. I'll add the relevant documentation in my PR and merge it too.

I'm available if needed to help on the kubernetes use case. Thanks for your work !

bpetit commented 3 years ago

What do you (all) think about that as a quickstart tutorial to run scaphandre using docker ?

Here is the image on docker hub. I've also added a note on container-based use cases: here.

bpetit commented 3 years ago

From the initial discussions I identify another use case around docker, to help testing scaphandre with dashboards easily and realize what it provides in the end of the stack. I opened a new FR for that (it's about docker-compose, grafana and prometheus, plus some automation): https://github.com/hubblo-org/scaphandre/issues/53

bpetit commented 3 years ago

This may be useful for the initial definition of the FR (which has a bit "evolved" since). At some point it will be needed for scaphandre to track the state and power consumption of the containers as such (and not only by filtering by processes to get the right data). So it will need to talk to the docker daemon (even if it's only one use case among the galaxy of container "engines" available...)

PierreRust commented 3 years ago

I'm not sure it really belongs to that FR (which is actually pretty wide and covers in may opinion several different things) , but I have in my fork a simple stack that could be used as an example in the doc : https://github.com/PierreRust/scaphandre/tree/feat/sample_stack_with_docker-compose

It simply uses docker-compose to run scaphandre, prometheus and grafana in a single step. Prometheus is already configured to scrape scaphandre and grafana's data source and dashboard are automatically provisioned (the current dashboard is rough though) , which makes it very easy to be up and running, even for users that are not familiar with these tools.

There's still some usability issues though : we don't really know which process consumption we should display dashboard , and displaying the consumption for all processes (>500 on my laptop ATM) is a bit overwhelming.

Let me know if that's something you had in mind.

bpetit commented 3 years ago

This is awesome. That's what I meant with the FR #53. Opening a pr and attaching it to #53 would be :heart_eyes:

rossf7 commented 3 years ago

Hi @bpetit it took longer than I'd hoped to get back to it. But I just opened a PR with a Helm chart and a tutorial for installing scaphandre as well as prometheus and grafana via helm.

https://github.com/hubblo-org/scaphandre/pull/72

I hope it's useful. There is also a k8s dashboard based on the one you have at https://metrics.hubblo.org/

bpetit commented 3 years ago

This is amazing, thanks a lot ! I'll have a look at that and merge once tested (can't wait to see it running on a k8s cluster 🥳 )

Was it successful on your side ? Did you have any particular challenge with scaphandre during your tests ?

rossf7 commented 3 years ago

Was it successful on your side ? Did you have any particular challenge with scaphandre during your tests ?

Thanks @bpetit my testing went well. The main problem I had was getting grafana to load the dashboard for the tutorial! 😄

I found that with the combination of the exe label from scaphandre and the kubernetes_node label from prometheus you could get a lot of useful info.

It might be useful to also have the kubernetes pod and namespace but I'm not sure how to get that without some k8s specific code in scaphandre.

bpetit commented 3 years ago

Very interesting, thanks. I think at some point we will add code for specific scenarii (we already did with the --qemu flag in the prometheus exporter for example.) So why not having some features to help properly tag and filter data coming from kubernetes cluster and its objects.

bpetit commented 3 years ago

Hi,

I just merged https://github.com/hubblo-org/scaphandre/pull/72 thanks lot @rossf7 !!!

As this thread contains a lot of stuff, i'll sum it up here. We have basically 4 topics that have been discussed:

having a working docker image, this is done since https://github.com/hubblo-org/scaphandre/pull/48
having a helm chart/working configuration to install scaphandre on a kubernetes/cluster's nodes, as a daemon set, this is done with this PR: https://github.com/hubblo-org/scaphandre/pull/72
having a docker-compose example to be able to setup the "classical" stack scaphandre+prometheus+grafana locally. This is on going with this issue and this PR
being able to measure on public cloud. This is a huge topic and the state of the discussions about it are here

I think we can say 1 and 2 are done. 3 is on-going and has it's own issue.

is beyond the scope of this issue in my opinion.

I guess we can close this issue and jump on the issues related to 3 and 4 for the remaining topics and open new issues for the improvements on 1 and 2 :)

hubblo-org / scaphandre