Closed erictune closed 8 years ago
I disagree with your premise. I think vertical auto-sizing is the way to go for many users. But this doesn't invalidate your overall premise, which is that a non-negligible number of users will want to provide resource specifications (let's not call them limits, since that's only part of the specification) for pre-existing configurations.
I think you missed one reasonable option: The configuration creator documents in comments the likely resources required, and the configuration user simply copies and edits the configuration for their scenario. Simple, predictable, reproducible, version-able, diff-able when the user wants to "rebase" to an updated configuration.
Multiple copies isn't a bad idea, either. It has most of the same properties as copy-and-modify, potentially saving a step but degrading to the same result.
Issues I see with ktempate expand --set
:
The generator example is one way to generate application-specific configs in general. However, again, a domain-specific pass for this would be reasonable, IMO.
We could define a transformation pass that injected resource specifications. I'd even be ok with handling common language runtimes like Java in a first-class way, but I also think there are 2 better answers for setting Java heap size:
I think @dchen1107 and @rjnagal discussed limits with @brendandburns and had some ideas that Dawn was going to write up. Maybe this issue would be a good place to put them. Brendan described it to me but I wasn't at the original discussion so it's better if one of them writes up the summary.
With respect to your proposal, one somewhat futuristic thing that occurred to me is that it would be nice if the Dockerfile format could be paramaterized along the lines of your template generator, so a Docker image could be shipped with information about a set of <command line, resource requirements> tuples that you could choose from when you deploy the container. I'm basing this suggestion on the assumption that in many cases the person who creates the Docker container has the best understanding of the resource requirements, and that may be someone who isn't even at the same organization as the person who is deploying it in Kubernetes.
@dchen1107 @rjnagal @brendandburns and I spoke a bit about limits on Thursday. Our POV was similar to @bgrant0607's in that we believe in the long term most users will use some auto-scaler to set limits (such that from the node's perspective there are limits) while a small subset will want to set their own limits. To get there it was thought that enforcing limits on all containers was too big of a hammer for current users so we thought of a way to try to bridge that gap.
The idea is that users that set their own limits today know what their containers require and want those resources to be guaranteed. Users that don't set their limit don't know or don't care what their containers need. In the node, we will artificially create two classes of containers: those with limits and those without. The containers with limits will be guaranteed their resources, while those without will receive them on a best-effort basis. On our-of-resources scenarios we will throttle or kill the containers without limits in favor of those with limits. This encourages users to set limits on containers, but allows blank limits for the time being. The reasoning for doing this at the node level rather than at a higher level is to allow the future inclusion of things like the auto-scaler. Once that component exists, the system continues to work without any changes on the node.
Our focus will initially be with CPU and memory. I think we can have most of this complete in the coming week since the changes are not extensive.
I think that what @erictune brings up here is separate from the above though as I see these templates still being useful for any jobs setting limits.
With respect to your proposal, one somewhat futuristic thing that occurred to me is that it would be nice if the Dockerfile format could be paramaterized along the lines of your template generator, so a Docker image could be shipped with information about a set of tuples that you could choose from when you deploy the container.
I guess markdown doesn't like angle brackets, as it ate part of my sentence. What I wrote was "a set of (command line flag, resource requirement) tuples"
In terms of software consumers, there is a set of useful information that isn't captured in #168 - the ability of a pod template author to convey minimum requirements. Most application authors or image creators are likely to be able to (although they may not start from the perspective of doing it initially) of defining a minimum memory requirement for their app, or minimum disk space, or minimum network IO. The value is that it guards against guaranteed failure of pods packed into nodes below that limit. In a world of people generating and reusing images and pod templates, giving authors the tools to define minimums also seems valuable.
Eric and talked through this briefly, which is what triggered the parameterization discussion.
I agree with @rjnagal's proposal to use the specification of limits to set the effective QoS level (#147). I was thinking of putting all limitless containers into a single set of cgroups, which would be dynamically resized to reserve capacity for the containers that set limits. That's not possible to do through Docker at the moment, sadly. (Note that if we could do that, I'd like to do something similar with individual pods.) We're also discussing what we can do with oom adjust and other mechanisms.
As for minimum requirements, I agree we should have it; that's called request
in resources.md. I could imagine auto-tuning request values, also, in order to influence at least placement of pods by the scheduler.
/cc @vishh
Do we need this open still? I think we've defined most of the pieces of this although we haven't captured the actual philosophy - use requests when writing your software, allow admins to enforce limits (hard or soft) via out of band processes, use auto-sizing to estimate in the absence of info, use rescheduling and cluster info to revise initial estimations, try to avoid over specifying as end user (unless you know for sure).
Fine with closing this.
There will be multiple templating systems:
Different ones may have different takes on how to default limits.
Admins may enforce upper limits on limits. Users should set request=limit if they need best QoS. Not sure if Guaranteed or Burstable is the best default. Right now there is not enough pressure to pick one or the other.
Starting assumption: Kubernetes should start defaulting to having hard enforcement of pod memory and cpu limits, and requiring pods to make resource requests. That's a premise of this issue -- if you disagree with it, let's discuss it in a separate issue.
If you accept that premise, then pods will have to have resource limits set on them. But, who should set them?
I think that a lot of people will want to start with a pod spec written by someone else. I'll call that a template, but in this context I don't mean the pod template that a replication controller uses.
The person who writes the pod template is in the best position to know things like:
The person who instantiates the pod template is in the best position to know if his usage scenario is small, medium, large, etc.
How to split those responsibilities, then?
Multiple templates
One approach would be to come up with several templates for different use cases, like this:
Filename
some_java_app_1G.pod.template
contains:... and so on for 512MG, 2G, and various sizes. However, this doesn't take advantage of the continuously adjustable resource limits provided by containers.
Parameters to Templates
A template file could declare and document its parameters, perhaps inside comments. Something like this:
File
some_java_app.pod.template
contains:then you could use a tool to expand the template like this:
which would use the default value or else set your own value:
and it could have have warnings:
However, it seems like it is a short step from just substituting variables, to wanting to do computations (e.g. set
-Xmx
to 95% of the container limit).Generator script
To allow for computations in templates, you could make up a DSL, or you could just let people use whatever language they want, like:
File
some_java_app_podmaker.go
contains:and run like this:
go run some_java_app.podmaker.go -- --mem 2000000000 | kubectl createall
Complex systems
Real examples would have multiple pods and replication controllers, and services and such. How will people share knowledge about how to write more complex config? How would that integrate with horizontal scaling of pods? Automatic vertical scaling?