Grouping by App (marathon)

samek commented 9 years ago

Hi, now that i have cadvisor up and running through marathon I'm getting data into the influxdb.

The problem that I'm facing is that I don't know which app is under docker name.

Let me try to explain: When marathon starts a docker it gives him a name for example screen shot 2015-02-27 at 15 46 58

for cadvisor it's named mesos-3deefa59-6981-4069-8c74-911aead8b396

As you can see in the screenshot I have 2 nginx images also started and they have completely different names. In the influxdb I cannot differentiate/group by apps since they are presented as mesos name.

Now, Marathon passes some environment variables into the started dockers which could be used to group it.

"MARATHON_APP_VERSION=2015-02-27T13:38:50.135Z", "HOST=10.0.0.193", "MESOS_TASK_ID=nginx2.f726a4c5-be85-11e4-82c2-56847afe9799", "PORT=31005", "PORTS=31005", "PORT_80=31005", "MARATHON_APP_ID=/nginx2", "PORT0=31005", "MESOS_SANDBOX=/mnt/mesos/sandbox", "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin", "NGINX_VERSION=1.7.10-1~wheezy"

those filelds are located in the docker dir config.json and should be accessible by cadvisor.

Question is would it be wise to also sent MARATHON_APP_ID into the influxdb in order to group stats by app ?

Or How would this be approached correctly ?

vmarmol commented 9 years ago

Today we have a concept of "namespaces" for container names and those come with a list of aliases used to address the container. Docker containers are under the "docker" namespace and its aliases are the Docker ID and name. This feature may be served by the "marathon" namespace and have the app ID as one of the aliases. This we will then push to the InfluxDB backend.

We probably don't have the bandwidth to tackle this anytime soon, but we will gladly take PRs towards that goal if you'd like :)

rjnagal commented 9 years ago

If we did a generic change of exporting aliases to storage backend, wouldn't it resolve this problem too?

On Fri, Feb 27, 2015 at 8:51 AM, Victor Marmol notifications@github.com wrote:

Today we have a concept of "namespaces" for container names and those come with a list of aliases used to address the container. Docker containers are under the "docker" namespace and its aliases are the Docker ID and name. This feature may be served by the "marathon" namespace and have the app ID as one of the aliases. This we will then push to the InfluxDB backend.

We probably don't have the bandwidth to tackle this anytime soon, but we will gladly take PRs towards that goal if you'd like :)

— Reply to this email directly or view it on GitHub https://github.com/google/cadvisor/issues/546#issuecomment-76427469.

vmarmol commented 9 years ago

Yes, you need to do both (I think we export the aliases today, if not we should). Here the aliases they're referring to are not a Docker container name or ID, is that correct @samek?

rjnagal commented 9 years ago

I see, the request is for exposing an env variable. This seems better handle in heapster aggregation where we can add arbitrary tags, or by exposing custom metrics hooks in cAdvisor.

@samek We do want to custom hooks to add extra columns, but we'll probably not get to it in near future.

samek commented 9 years ago

@vmarmol yes I'm not 100% sure but aliases are exported. Problem is that there's no link to the marathon task id/app which is different. And If you want to autoscale depending on the app which is run by marathon you would also need at least MARATHON_APP_ID to go with it.

Are you guys at all parsing json from the /var/lib/docker/containers/XXXXX/config.json ?

vmarmol commented 9 years ago

We don't read that file, but we read the libcontainer config here. This config does have the env vars so it should have the data you need.

martensson commented 9 years ago

Going to join the thread, having the same issue when trying to use cadvisor in combination with Marathon and Mesos.

BrickXu commented 9 years ago

+1

mhausenblas commented 9 years ago

@samek can you pls share your Marathon app spec (JSON) file for cAdvisor?

samek commented 9 years ago

@mhausenblas We're not using cadvisor anymore :(

For mesos monitoring and task monitoring from marathon we use https://github.com/bobrik/collectd-docker https://github.com/bobrik/docker-collectd-mesos

It solved all our problems.

If you need those I can post them for sure.

mhausenblas commented 9 years ago

Awesome, thanks @samek — yes, the Marathon app spec would be appreciated!

scalp42 commented 8 years ago

@samek any chance?

samek commented 8 years ago

@scalp42 I've sent it directly to @mhausenblas since it's not related to cadvisor at all. But sure, I'll just resend the mail to you.

salimane commented 8 years ago

@samek the marathon app spec could be put in a gist on github somewhere and it would be appreciated :)

samek commented 8 years ago

@salimane It feals wrong that I'm posting solution in cadvisor page since It doesn't use it. anyway.

So in order to use it: On each mesos-slave run docker run -d -e GRAPHITE_HOST=IP_OF_GRAPHITE_HOST -e COLLECTD_HOST=IP_OF_MESOS_SLAVE_WITH_UNDERSCORES bobrik/collectd-docker

For example If my graphite host is 10.0.0.251 and the mesos slave ip is 10.0.0.193 you would run:

docker run -d -v /var/run/docker.sock:/var/run/docker.sock -e GRAPHITE_HOST=10.0.0.251 -e COLLECTD_HOST=10_0_0_193 bobrik/collectd-docker

(I suggest that you run the docker with restart=always)

Then when defining app in marathon you have to add couple of env vars which are picked by that docker. You have to add COLLECTD_DOCKER_APP, COLLECT_DOCKER_TASK_ENV and COLLECTD_DOCKER_TASK_ENV_TRIM_PREFIX.

for example one of our api project looks like this:

{
 "container": {
   "type": "DOCKER",
   "docker": {
      "image": "10.0.0.48:5000/spored-api:v6",
     "network": "BRIDGE",
     "portMappings": [
       { "containerPort": 80, "hostPort": 0, "servicePort": 8885, "protocol": "tcp" }
     ]
   },
    "volumes": [
      {
        "containerPath": "/var/log/nginx",
        "hostPath": "/var/log/dockerlogs/nginx",
        "mode": "RW"
      }
    ]
 },
 "id": "spored-api",
 "cpus": 0.5,
 "mem": 500,
 "env":  {"COLLECTD_DOCKER_APP":"spored-api", "COLLECTD_DOCKER_TASK_ENV":"MESOS_TASK_ID", "COLLECTD_DOCKER_TASK_ENV_TRIM_PREFIX":"spored-api"},
 "constraints": [
    ["env", "CLUSTER", "live"]
  ],
"upgradeStrategy": {
        "minimumHealthCapacity": 0.5,
        "maximumOverCapacity": 0.8
    },
 "healthChecks": [
    {
      "protocol": "HTTP",
      "portIndex": 0,
      "path": "/",
      "gracePeriodSeconds": 60,
      "intervalSeconds": 20,
      "maxConsecutiveFailures": 6
    }
  ]
}

Now When you go to grafana (load the template which is available on githhub) and you should be able to pick stats by app.

screen shot 2015-09-16 at 11 37 56

salimane commented 8 years ago

@samek thanks :+1:

kopinions commented 8 years ago

Any update? Is the feature support now?

dashpole commented 7 years ago

I dont think anyone has any plans to address this. Anyone who has interest in this can feel free to propose and implement a solution.

rikwasmus commented 7 years ago

We lazily fixed it in a fork by abusing the 'exposedenv' system, as I was not going to fix the difference between outputs getting either ContainerReference or ContainerInfo objects. This is most surely not the way to do it properly, hence no merge request, but it might help some folks.

diff --git a/container/docker/handler.go b/container/docker/handler.go
index dd0a2cd..11bcf91 100644
--- a/container/docker/handler.go
+++ b/container/docker/handler.go
@@ -257,6 +257,14 @@ func newDockerContainerHandler(
                                if len(splits) == 2 && splits[0] == exposedEnv {
                                        handler.envs[strings.ToLower(exposedEnv)] = splits[1]
                                }
+                               // Add exposed environments as labels to enable them in all outputs
+                               // Ideally, the outputs would handle it themselves, however, the 
+                               // difference between a propagated ContainerReference or ContainerInfo
+                               // is harder to fix, and this is easier for now. This means the outputs
+                               // do not differentiate between labels and exposed environmental vars,
+                               // and environmental vars with the same name can possibly overwrite 
+                               // labels: this can be seen as a feature.
+                               handler.labels[strings.ToLower(exposedEnv)] = splits[1]
                        }
                }
        }

google / cadvisor

Grouping by App (marathon) #546