raintank / raintank-docker

raintank docker images and dev stack DEPRECATED / UNMAINTAINED
https://blog.raintank.io/docker-based-development-environment/
16 stars 4 forks source link

measure.sh does not work. #43

Open woodsaj opened 9 years ago

woodsaj commented 9 years ago

firstly nc -c localhost 2003 is invalid syntax for netcat.

from the man page

-c string    specify shell commands to exec after connect (use with caution).  The string is passed to /bin/sh -c for execution.  See the -e option if you don't  have  a  working
                    /bin/sh (Note that POSIX-conformant system must have one).

also, on my standaard Ubuntu14.04 machine, the rest of the command fails to collect any metrics.

I think instead of using TOP, i feel that we should use docker stats which is designed for measuring resource of containers.

This command will fetch the stats of all running containers.

docker stats $(docker ps|awk '{print $(NF)}'|grep -v NAMES|xargs echo)

woodsaj commented 9 years ago

we should also run this from inside a container to ensure consistent behavior. It is possible to have a container talk directly to docker by mounting the docker unixSocket. eg.

docker run -t -i -v /var/run/docker.sock:/docker.sock ubuntu bash

We could also simplify the monitoring by just using the docker API directly. https://docs.docker.com/reference/api/docker_remote_api_v1.17/#get-container-stats-based-on-resource-usage

eg.

echo -e "GET /containers/raintankdocker_graphiteApi_1/stats HTTP/1.0\r\n" | socat unix-connect:/var/run/docker.sock STDIO
Dieterbe commented 9 years ago

i'ld love to use something portable that is guaranteed to work everywhere and doesn't depend on nc/top/awk version but the issue is the graphiteApi container where i want to get the stats of each individual process, because i want to see if there's any of the processes that hits 100% cpu by itself.

firstly nc -c localhost 2003 is invalid syntax for netcat.

this depends on which netcat you have installed. on my system i have GNU netcat 0.7.1 which says

-c, --close                close connection on EOF from stdin

what version do you have?

we could try to detect nc version and invoke it differently. (gnu cat supports nc -V and nc -help to show the version, does yours do too?)

on my standaard Ubuntu14.04 machine, the rest of the command fails to collect any metrics.

distribution version doesn't say much. it depends on what tools are installed. my top is from procps-ng 3.3.11-2 and my awk is GNU Awk 4.1.3 (though the usage of awk in measure.sh should be compatible with most versions, see http://mywiki.wooledge.org/BashFAQ/009) you could comment the part where it pipes into awk and see if the problem is the top part of the script or the awk part (or something else)

i'ld love to ultimately collect data in the same way we would/do on dev/prod, so that our dashboards that show cpu/mem stats can be reused in all environments, from devstack to prod. but then that would involve running a monitoring agent like diamond/collectd/...

for now the key goal is to just get the stats i need quickly.

re your suggestions:

echo -e "GET /containers/raintankdocker_graphiteApi_1/stats HTTP/1.0\r\n" | socat unix-connect:/var/run/docker.sock STDIO
HTTP/1.0 200 OK
Server: Docker/1.8.3 (linux)
Date: Tue, 27 Oct 2015 12:31:27 GMT
Content-Type: text/plain; charset=utf-8

i basically don't get any content :( i use docker 1.8.3 with api version 1.20 according to https://docs.docker.com/reference/api/docker_remote_api_v1.20/ that command should return rich stats but it doesn't for me. it does say "this functionality currently only works when using the libcontainer exec-driver." so maybe that's why.

that docker stats $(docker ps|awk '{print $(NF)}'|grep -v NAMES|xargs echo) command is nice. bonus points for it returning network io as well. We could call docker stats --no-stream $(docker ps|awk '{print $(NF)}'|grep -v NAMES|xargs echo) repeatedly in a loop but: 1) not sure if you then get accurate readings of resources used between each time you call it. (top -b does this well) 2) i still don't have per-process statistics 3) this only replaces the top piece of the script and would still involve awk (though we wouldn't need to disable buffering in this case which mean it should work with all awk versions) but it would still need nc as well.

from the api docs i also saw there's a /top endpoint:

echo -e "GET /containers/raintankdocker_graphiteApi_1/top HTTP/1.0\r\n" | socat unix-connect:/var/run/docker.sock STDIO
HTTP/1.0 404 Not Found
Content-Type: text/plain; charset=utf-8
Server: Docker/1.8.3 (linux)
X-Content-Type-Options: nosniff
Date: Tue, 27 Oct 2015 12:43:42 GMT
Content-Length: 192

[8] System error: open /sys/fs/cgroup/cpu,cpuacct/init.scope/system.slice/docker-dd4e2b66d53737c211784e867e871db5cb03e8e1829a1b92550ecfb463327560.scope/cgroup.procs: no such file or directory

even if this worked, it probably wouldn't be great at being called iteratively and providing the stats between each call. especially since it actually seems to be calling ps instead of top which shows the stats across the entire duration of the process.

woodsaj commented 9 years ago

Ubuntu provides 2 alternatives of netcat. netcat-openbsd and netcat-traditional, neither of these appear to be the version you are using. "GNU netcat" is a rewrite of netcat hosted on sourceforge :( http://netcat.sourceforge.net/

The measure.sh script seems to rely on features of netcat, awk and sed that are not supported on 99% of linux distributions by default.

But this problem is exactly why all tools in this stack need to run within a container.

Dieterbe commented 9 years ago

hmm i thought a GNU userland was fairly common across all linux distro's.

again i'm totally open to do things in a more portable way, i'm just not seeing a solution that gives insight into individual processes, properly accounts for time during checks (which top does but not ps) and that i can implement in a reasonable timeframe.

woodsaj commented 9 years ago

i would much rather have per-container metrics then no metrics at all. Having a metrics collector agent (measure.sh) that only works on Arch Linux seems fairly pointless.

woodsaj commented 9 years ago

how about this:

https://github.com/google/cadvisor