prometheus-community / ecs_exporter

Prometheus exporter for Amazon Elastic Container Service (ECS)
Apache License 2.0
77 stars 19 forks source link

Explanation of ecs_memory_limit_bytes #35

Open jseiser opened 2 years ago

jseiser commented 2 years ago

Can anyone explain what this metric is actually returning the bytes of?

# HELP ecs_memory_limit_bytes Memory limit in bytes.
# TYPE ecs_memory_limit_bytes gauge
ecs_memory_limit_bytes{container="heartbeat"} 9.223372036854772e+18
ecs_memory_limit_bytes{container="log_router"} 9.223372036854772e+18
ecs_memory_limit_bytes{container="prom_exporter"} 9.223372036854772e+18

9223372036854772000 bytes is how prometheus is reading that, which is like 9223372036.8547725677 Gigabytes.

The task definition itself is defined with

memory                   = "1024"

And the containers inside the task definition container

prom_exporter memoryReservation : 100 heartbeat memoryReservation : 256 log_router memoryReservation : 100

If i grab a performance metric log from ECS for the heartbeat container

ContainerName | heartbeat
CpuReserved | 0.0
CpuUtilized | 0.8063517761230469
MemoryReserved | 256
MemoryUtilized | 61

Would be great if anyone could help me understand what exactly I am looking at here. I really just want to be able to track the memory usage of my fargate containers.

jseiser commented 2 years ago

OK, bumping up to the main container, I can get a total memory used that makes more sense.

sum by (ecs_task_id, container) (ecs_memory_bytes{} + ecs_memory_cache_usage{})

But Im still not sure what/how ecs_memory_limit_bytes is. Basically, I can tell my memory usage + cache for each container, but have no way of saying its using X% of its reservation, or X% of the total allocated memory.

I would expect to be able to get either MemoryReserved or the Total Memory Available to the container, so you could determine if the container needs more or less memory allocated.

jseiser commented 2 years ago

So I assume its coming from here:

https://github.com/moby/moby/blob/v20.10.17/api/types/stats.go#L59

    // number of times memory usage hits limits.
    Failcnt uint64 `json:"failcnt,omitempty"`
    Limit   uint64 `json:"limit,omitempty"`

This cant really be the memory limit though, the number is way to high, since we have already limited the entire Task to 1 GB.

discordianfish commented 1 year ago

Ok hit the same, so the metadata v4 api the ecs-exporter scrapes is documented here: https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task-metadata-endpoint-v4.html

This path returns Docker stats for the specific container. For more information about each of the returned stats, see ContainerStats in the Docker API documentation.

ContainerStats come from https://www.kernel.org/doc/Documentation/cgroup-v1/memory.txt

So the limit is the limit within a cgroup that can be nested in other cgroups. If your task has a memory limit but not your container, the container c(sub)group has effectivelly no limit (in my case it's set to 8EiB) but is limited by the limit in the parent, task cgroup. This limit isn't exposed in the tasks API though since docker itself doesn't expose it since docker doesn't deal with nested cgroups. Fortunately there is a hierarchical_memory_limit which should give us what we want and it should be easy to add this to the exporter.

For now though #53 added ecs_svc_memory_limit_bytes (which IMO should be called ecs_task_memory_limit_bytes fwiw, but I think there should be plenty of other renaming being done)

SuperQ commented 1 year ago

@discordianfish Feel free to send a renaming PR.