sflow / host-sflow

host-sflow agent
http://sflow.net
Other
146 stars 55 forks source link

Missing docker vir_* metrics when using hsflowd as a container #15

Closed moss2k13 closed 7 years ago

moss2k13 commented 7 years ago

Hello, I'm trying to use hsflowd as a container on coreos using this Dockerfile

Hsflowd package hsflowd-ubuntu16_2.0.5-7_amd64.deb has been built (commit f7bcfac) by running:

sudo ./docker_build_on ubuntu16

I'm using below command to start the containers:

/usr/bin/docker run --cap-add=NET_ADMIN --pid=host --uts=host --net=host \
-v /var/run/docker.sock:/var/run/docker.sock -v /sys/fs/cgroup/:/sys/fs/cgroup/:ro \
--name hsflowd hsflowd

I don't have vir_* metrics available in sflow-rt from hsflowds running as a container:

core@core-1 ~ $ curl http://localhost:8008/dump/10.x.x.81/ALL/json|grep metricName|grep vir
core@core-1 ~ $ curl http://localhost:8008/dump/10.x.x.82/ALL/json|grep metricName|grep vir
core@core-1 ~ $ curl http://localhost:8008/dump/10.x.x.83/ALL/json|grep metricName|grep vir

I see them from hsflowd on normal ubuntu vm:

core@core-1 ~ $ curl http://localhost:8008/dump/10.x.x.84/ALL/json|grep metricName|grep vir|wc -l
140

I attached hsflowd logs from both (container and vm) below:

I have other containers running on nodes 10.x.x.81-83 where I use hsflowd as a container:

core@core-1 ~ $ dkc
CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS                                            NAMES
b0f75c9f95fa        nginx                    "nginx -g 'daemon off"   58 minutes ago      Up 58 minutes       80/tcp, 443/tcp                                  awesome_sammet
c7467a33eb0b        localhost:5000/hsflowd   "/bin/sh -c '/etc/ini"   About an hour ago   Up About an hour                                                     hsflowd
c88f6c3dca15        sflow/sflow-rt           "/sflow-rt/start.sh"     7 hours ago         Up 7 hours          0.0.0.0:6343->6343/udp, 0.0.0.0:8008->8008/tcp   fervent_brattain
c12de84bf3cb        registry:2               "/entrypoint.sh /etc/"   26 hours ago        Up 26 hours         127.0.0.1:5000->5000/tcp                         registry
core@core-1 ~ $ dke -it hsflowd bash

root@core-1:/# export DOCKER_API_VERSION=1.22

root@core-1:/# docker ps
CONTAINER ID        IMAGE                    COMMAND                  CREATED             STATUS              PORTS                                            NAMES
b0f75c9f95fa        nginx                    "nginx -g 'daemon off"   58 minutes ago      Up 58 minutes       80/tcp, 443/tcp                                  awesome_sammet
c7467a33eb0b        localhost:5000/hsflowd   "/bin/sh -c '/etc/ini"   About an hour ago   Up About an hour                                                     hsflowd
c88f6c3dca15        sflow/sflow-rt           "/sflow-rt/start.sh"     7 hours ago         Up 7 hours          0.0.0.0:6343->6343/udp, 0.0.0.0:8008->8008/tcp   fervent_brattain
c12de84bf3cb        registry:2               "/entrypoint.sh /etc/"   26 hours ago        Up 26 hours         127.0.0.1:5000->5000/tcp                         registry

I could missed something along the way though

sflow commented 7 years ago

At first glance it looks like the initial call to get the list of current containers is successful in that it gets an answer back here: https://gist.github.com/moss2k13/735fca569864b9151ef0b0baaaff4f3f#file-hsflowd_from_container-log-L446-L457

but somehow that answer is never processed, like it is in the non-containerized version here: https://gist.github.com/moss2k13/7bf6ee39d58e41323670fee2064b2f75#file-hsflowd_from_standard_ubuntu_vm-log-L1023-L1223

Assuming you can't easily run it in the debugger and set breakpoints, I guess I would add print-statements in places like this to see what might be happening to that request: https://github.com/sflow/host-sflow/blob/master/src/Linux/mod_docker.c#L1199

Maybe mod_docker is expecting an EVSOCKETREAD_EOF status that never comes?

(You can see that GET /events answers are being processed OK, but those are handled on EVSOCKETREAD_STR. The GET /containers request is different. It will accumulate the result until it hits EOF, and then deliver the whole result at once).

moss2k13 commented 7 years ago

I've confirmed that I have vir_* metrics when I start the same hsflowd container under ubuntu xenial:

root@ubuntu:~# curl http://localhost:8008/dump/10.x.x.81/ALL/json|grep metricName|grep vir|wc -l
28

Hsflowd log below - I created nginx container, executed bash and removed nginx:

I changed the privileges during container start from --cap-add=NET_ADMIN to --privileged.

When I start the same hsflowd under coreos with the same privileges vir_* metrics are not there.

I'll ask coreos guys for support - I'll update any progress here.

moss2k13 commented 7 years ago

Vir_* metrics are available with hsflowd-ubuntu16_2.0.6.-1_amd64.deb (4c79bd3) under coreos stable 1185.3.0:

core@core-1 ~$ cat /etc/motd
CoreOS stable (1185.3.0)

core@core-1 ~ $ curl http://localhost:8008/dump/10.x.x.15/ALL/json|grep metricName|grep vir|wc -l
56

Thanks for the support!

sflow commented 7 years ago

Thanks! I still plan to experiment with a change where we close the connection from the hsflowd side instead of just assuming that the docker side will close it. Shouldn't do any harm, and it might solve the problem you saw.