Closed ssiahetiong closed 6 years ago
Load average per node:
regex("\"name\": \"load\.1min\",\s+\"value\": (?<value>\S+),") | timechart(series=http.dcos_metrics.dimensions.hostname,function=avg(value))
I tweaked your searches a little bit :)
"metricset.namespace"="container_metrics"|split(http.container_metrics.datapoints) \
| "http.container_metrics.datapoints.name"="mem.total" \
| timechart(mesos_slave_id, function=avg(http.container_metrics.datapoints.value), unit=bytes, span=1m)
metricset.namespace="node_metrics" | split(http.node_metrics.datapoints) \
| http.node_metrics.datapoints.name="load.1min" \
| timechart(mesos_slave_id, function=avg(http.node_metrics.datapoints.value))
Other than that, it looks really promising. I've found a bug when running on enterprise cluster where the Metrics API requires authentication, so I'm trying to sort out a good way of fixing that.
As much as I would like to accept this, I've found an issue with Strict and Permissive clusters. Since the metrics api is protected behind http authentication some more work is needed to be able to get a meaningful response.
As of now, I've been able to create a new service user in the cluster. Login with "him" on my local machine, and get the authtoken out
dcos security org service-accounts create -p humio-agent-public.pem -d "Humio Agent service account" humio_agent
dcos security org users grant humio_agent "dcos:adminrouter:system:agent" full
dcos security org users grant humio_agent "dcos:adminrouter:ops:system-metrics" full
dcos auth login --username=humio_agent --private-key=humio-agent.pem
dcos config show core.dcos_acs_token
The tricky bit is the fact the token expires after some time.
There is a hacky way to generate a lifetime token.
cat /opt/mesosphere/etc/bouncer-config.json
{
"SUPERUSER_PASSWORD_HASH": "xxxxx",
"SUPERUSER_USERNAME": "xxxxx",
"LOG_LEVEL_STDERR": "INFO",
"AUTH_COOKIE_SECURE_FLAG": false,
"EXPIRATION_AUTH_TOKEN_DAYS": 5,
"EXPIRATION_AUTH_COOKIE_DAYS": 5,
"EXPIRATION_INFO_COOKIE_DAYS": 5,
"DATASTORE_ZK_BACKEND_HOSTS": "zk-1.zk:2181,zk-2.zk:2181,zk-3.zk:2181,zk-4.zk:2181,zk-5.zk:2181",
"GUNICORN_WORKER_PROCESSES": 1,
"GUNICORN_THREADS_PER_WORKER": 10,
"GUNICORN_BIND_ADDRESS": "127.0.0.1:8101",
"GUNICORN_WORKER_TIMEOUT_SECONDS": 10
}
Edit EXPIRATION_AUTH_TOKEN_DAYS to 100 years
systemctl restart dcos-bouncer
Yeah, but I wouldn't want to ask my users to do that :)
According to docs: Access to the Metrics API is proxied through the Admin Router, if we can just bypass the admin router and access directly dcos metrics http endpoint
It'll be this one actually https://github.com/dcos/dcos/blob/08305a1e7f69fa073090b380cb6738d14e852e9c/packages/adminrouter/extra/src/includes/server/open/agent.conf#L29-L32.
But it doesn't look like metrics
is resolvable
We have to read from the unix socket: https://github.com/dcos/dcos/blob/08305a1e7f69fa073090b380cb6738d14e852e9c/packages/adminrouter/extra/src/includes/http/agent.conf#L18
Some investigation later.
location /system/v1/metrics/ {
access_by_lua_block {
auth.access_system_metrics_endpoint();
util.clear_dcos_cookies();
}
include includes/proxy-headers.conf;
proxy_pass http://metrics/;
}
Is authenticating and forwarding to
upstream metrics {
server unix:/run/dcos/dcos-metrics-agent.sock;
}
Which is a Unix socket. So unless metricbeat and do that we're a bit stuck
Got an answer from Mesosphere
In general, the service is expected to auto-renew its JWT token periodically or on-demand, when the API responds with a
401
Obviously that wouldn't work with Metricbeats
Would it be feasible to create a CRON job to generate auth token every day then pass to metric beats config file?
Possibly. I think I would put it behind a feature flag for now, so that we can support open clusters.
Im not sure if this we can use this: https://docs.mesosphere.com/1.10/security/ent/iam-api/#/login/post_auth_login
I don't have an enterprise cluster so I can't try the service account login.
Total memory per container:
regex("\"name\": \"mem\.total\",\s+\"value\": (?<value>\S+),") | timechart(series=http.dcos_metrics.dimensions.executor_id,function=avg(value))