humio / dcos2humio

Apache License 2.0
1 stars 1 forks source link

Use dcos metrics for metricbeat #17

Closed ssiahetiong closed 6 years ago

ssiahetiong commented 6 years ago

Total memory per container: regex("\"name\": \"mem\.total\",\s+\"value\": (?<value>\S+),") | timechart(series=http.dcos_metrics.dimensions.executor_id,function=avg(value))

screenshot from 2017-12-26 19-59-24

ssiahetiong commented 6 years ago

Load average per node: regex("\"name\": \"load\.1min\",\s+\"value\": (?<value>\S+),") | timechart(series=http.dcos_metrics.dimensions.hostname,function=avg(value))

screenshot from 2017-12-27 01-47-01

mwl commented 6 years ago

I tweaked your searches a little bit :)

"metricset.namespace"="container_metrics"|split(http.container_metrics.datapoints) \
| "http.container_metrics.datapoints.name"="mem.total" \
| timechart(mesos_slave_id, function=avg(http.container_metrics.datapoints.value), unit=bytes, span=1m)
metricset.namespace="node_metrics" | split(http.node_metrics.datapoints) \
| http.node_metrics.datapoints.name="load.1min" \
| timechart(mesos_slave_id, function=avg(http.node_metrics.datapoints.value))
mwl commented 6 years ago

Other than that, it looks really promising. I've found a bug when running on enterprise cluster where the Metrics API requires authentication, so I'm trying to sort out a good way of fixing that.

mwl commented 6 years ago

As much as I would like to accept this, I've found an issue with Strict and Permissive clusters. Since the metrics api is protected behind http authentication some more work is needed to be able to get a meaningful response.

As of now, I've been able to create a new service user in the cluster. Login with "him" on my local machine, and get the authtoken out

dcos security org service-accounts create -p humio-agent-public.pem -d "Humio Agent service account" humio_agent
dcos security org users grant humio_agent "dcos:adminrouter:system:agent" full
dcos security org users grant humio_agent "dcos:adminrouter:ops:system-metrics" full
dcos auth login --username=humio_agent --private-key=humio-agent.pem
dcos config show core.dcos_acs_token

The tricky bit is the fact the token expires after some time.

ssiahetiong commented 6 years ago

There is a hacky way to generate a lifetime token.

  1. cat /opt/mesosphere/etc/bouncer-config.json { "SUPERUSER_PASSWORD_HASH": "xxxxx", "SUPERUSER_USERNAME": "xxxxx", "LOG_LEVEL_STDERR": "INFO", "AUTH_COOKIE_SECURE_FLAG": false, "EXPIRATION_AUTH_TOKEN_DAYS": 5, "EXPIRATION_AUTH_COOKIE_DAYS": 5, "EXPIRATION_INFO_COOKIE_DAYS": 5, "DATASTORE_ZK_BACKEND_HOSTS": "zk-1.zk:2181,zk-2.zk:2181,zk-3.zk:2181,zk-4.zk:2181,zk-5.zk:2181", "GUNICORN_WORKER_PROCESSES": 1, "GUNICORN_THREADS_PER_WORKER": 10, "GUNICORN_BIND_ADDRESS": "127.0.0.1:8101", "GUNICORN_WORKER_TIMEOUT_SECONDS": 10 } Edit EXPIRATION_AUTH_TOKEN_DAYS to 100 years

  2. systemctl restart dcos-bouncer
  3. generate token
  4. change expiration back to default
mwl commented 6 years ago

Yeah, but I wouldn't want to ask my users to do that :)

ssiahetiong commented 6 years ago

According to docs: Access to the Metrics API is proxied through the Admin Router, if we can just bypass the admin router and access directly dcos metrics http endpoint

ssiahetiong commented 6 years ago

https://github.com/dcos/dcos/blob/08305a1e7f69fa073090b380cb6738d14e852e9c/packages/adminrouter/extra/src/includes/server/open/master.conf#L49

mwl commented 6 years ago

It'll be this one actually https://github.com/dcos/dcos/blob/08305a1e7f69fa073090b380cb6738d14e852e9c/packages/adminrouter/extra/src/includes/server/open/agent.conf#L29-L32.

But it doesn't look like metrics is resolvable

ssiahetiong commented 6 years ago

We have to read from the unix socket: https://github.com/dcos/dcos/blob/08305a1e7f69fa073090b380cb6738d14e852e9c/packages/adminrouter/extra/src/includes/http/agent.conf#L18

mwl commented 6 years ago

Some investigation later.

location /system/v1/metrics/ {
    access_by_lua_block {
        auth.access_system_metrics_endpoint();
        util.clear_dcos_cookies();
    }

    include includes/proxy-headers.conf;
    proxy_pass http://metrics/;
}

Is authenticating and forwarding to

upstream metrics {
    server unix:/run/dcos/dcos-metrics-agent.sock;
}

Which is a Unix socket. So unless metricbeat and do that we're a bit stuck

mwl commented 6 years ago

Got an answer from Mesosphere

In general, the service is expected to auto-renew its JWT token periodically or on-demand, when the API responds with a 401 Obviously that wouldn't work with Metricbeats

ssiahetiong commented 6 years ago

Would it be feasible to create a CRON job to generate auth token every day then pass to metric beats config file?

mwl commented 6 years ago

Possibly. I think I would put it behind a feature flag for now, so that we can support open clusters.

ssiahetiong commented 6 years ago

Im not sure if this we can use this: https://docs.mesosphere.com/1.10/security/ent/iam-api/#/login/post_auth_login

I don't have an enterprise cluster so I can't try the service account login.