Open rhowe opened 1 year ago
AWS feature request: https://github.com/aws/containers-roadmap/issues/1893
Hi,
Thanks for opening the upstream feature request.
Outside of that feature request landing, you mentioned the possibility of collecting this data via reading the task definition itself. What endpoint and APIs would need to be hit for this?
It would require calling the AWS ECS API's DescribeTaskDefinition
method with the ARN of the task definition (which I think can be read from the ECS task metadata)
Is it possible to expose cpu limit from docker container? Currently, cpu usage from inputs.docker and ecs service metrics are different and confusing. Also, inputs.docker plugin needs to be aware of cgroups. I did a small experiment, running a docker container in my machine and read a docker inspect in an ecs container.
In regular docker containers, cpuquota, cpuset, cpuperiod and other cpu* information are filled with docker run parameters. In ecs containers, cpu control is made by using cgroup:
That means docker input need to be aware of all those variables, look where cgroup filesystem is mounted to be able to read this information and send it in form of metrics to telegraf outputs. I'm able to modify it and PR this, but need some advice to change in correct location
BTW, I just found an ecs input that might work correctly. I'll take a look and see if it suffices.
Use Case
Currently the ecs input plugin exposes memory usage data and also the container/task memory limit. These are exposed via the docker stats endpoint. It would be extremely useful to also know about the memory reservation (soft limit) and (optional) CPU reservation associated with each container, as this is one of the primary drivers for resource allocation within ECS and can help detect over/underprovisioning. As far as I can tell this information is not available from the ECS task metadata, but could be parsed from the task definition on startup, given appropriate IAM permissions. Since task definitions are immutable, this would be a one-off operation. The soft limit is also available from 'docker inspect' under the
MemoryReservation
value although I don't think that's easily accessible from within the task.Expected behavior
An
ecs_container_mem_reservation
metric with the value in bytes of the container reservation, and anecs_container_cpu_reservation
metric with the number of CPU units reserved for the container (value of 0 if no reservation is set?)Actual behavior
This data is not exposed currently
Additional info
I'm going to raise a feature request with AWS to request that these values are exposed in the container stats endpoint. That would obviously be the best solution, although if they do add it, it may require container metadata v4 support in telegraf.