Open DingoEatingFuzz opened 3 years ago
I’ve only accomplished item 1 from the first list here and am working on 2 at the moment:
I’ll reopen but let me know if there’s some better way to track this?
I’ve only accomplished item 1 from the first list here and am working on 2 at the moment:
Oops, sorry!
I’m leaning toward #10459 being an incorrect implementation, now that I understand this better. Or at least subpar, as I’m not sure how else to accomplish it…
When I run a Nomad dev agent without memory oversubscription enabled, I get a warning when submitting a job with a memory_max
-configured task that since oversubscription isn’t enabled, that configuration will be ignored. But the API response for the job still returns the memory_max
within the task’s Resources
:
The task group details ribbon checks whether the sum of provided memory_max
es on its tasks is greater than the sum of the memory
s and shows the bracketed maximum if so. This shows regardless of whether oversubscription is actually working.
I’ve subsequently understood that the allocation response is a place to determine the true situation vs the configured one. In this screenshot, I have #10508 running against two different dev agents; the left has oversubscription enabled, the right does not. You can see that AllocatedResources
and Resources
in the allocation response reflect the true state of things. The primary metric chart only shows the oversubscription annotation on the left, as expected.
So… I’m not sure what to do about the task group details ribbon, as it seems incorrect to me to present the configured memory_max
even when it’s ignored, but it’s also not possible to know whether it’s been ignored from the information available to it 🤔
The allocation metric annotation is correct now, at least, but I’m struggling with accessing AllocatedResources.Tasks
to properly determine the task metric annotation 😢 ETA the answer is: task states, the data is already there 😆
10247 introduces the ability to describe memory as both a soft and hard limit. The soft limit (
memory
) tells the scheduler how much memory needs to be set aside, the hard limit (memory_max
) tells Nomad at what point a task should be OOMed.This nuance also needs to be communicated in the UI. There are three pieces to this of varying scope.
Show this metadata in the task group details ribbon
This one is straightforward. Mimic the language and data used in the CLI updates on the task group detail page. The numbers in this ribbon are already an aggregate of individual task requirements.
If a task group has no
memory_max
set, then this ribbon should be unchanged.Show both the soft and hard limit in the memory utilization graph for both allocations and tasks
First and foremost, this can be deferred. If we make no changes to this graph, it will naturally report utilization percentages above 100% and the y-axis will adjust, just like we do with CPU soft limits already. This is still pretty confusing though, since it's unclear if the percentage is based on the soft limit or the hard limit.
We can improve this by doing the following:
If an allocation has no
memory_max
set, this graph should have no annotation.Show oversubscription at a client level on both the client detail page and the topology visualization
There are no designs for this yet. Just wanted to mention it here to track the concept.