Closed bgrant0607 closed 8 years ago
The templating and parameterization should also be completely decoupled from the config reconciliation process (#1702). That means that there shouldn't be template functions that depend on kubernetes/kubectl/etc. unless it is done in a completely generic way. This way someone can use jinja2, jsonnet, yaml, or whatever other templating method they want and still be able to take full advantage of config reconciliation.
As a counter-example, see the templates for GCE's Cloud Deployment Manager at https://cloud.google.com/deployment-manager/step-by-step-guide/using-template-and-environment-variables:
Templates can also take advantage of environment variables that are automatically populated. Valid environment variables include the deployment name, the project ID, the name property of your resource, and the type of your configuration.
The problem with this is it now couples the templating language your configuration is written in with kubernetes' tooling. Instead, either dynamic variables shouldn't be required to properly form config and they can be added later, or the interface between the tooling should be more generic (e.g. set an annotation on an object to get a field auto-populated, or be able to specify an environment variable directly in a valid JSON struct which gets substituted after the struct is POST'ed to apiserver).
@jackgr
An example from the community: https://github.com/UKHomeOffice/kb8or
@ghodss Not sure I'm following your point about environment variables tying the templating language to Kubernetes tooling. Environment variables are a very general purpose mechanism for injecting context into processes. Their widespread use suggests that this is a useful capability. A templating engine that didn't respect environment variables would probably need some other mechanism to serve the same purpose. What am I missing here?
@jackgr The concern isn't directly related to environment variables. Instead it's about the idea that the templating supports "special" function calls that need information only known at template application/reconciliation. e.g. if I want to pass the node's hostname as a special var into a pod, maybe there's a way to express that in the templating language. But that means the templating is now coupled to kubernetes itself. That may lead down a dangerous road of having only one "first-class" templating language, and if you want to use your own (e.g. jsonnet) you either have to duplicate functionality or do without it.
The GCE example I cited demonstrates that GCE's templating is not cleanly decoupled from the application of the templates because the templating depends on information only resolvable by the cluster. I.e. if I used my own templating language, I wouldn't be able to have access to things like the current deployment name or the project ID, because those are only known during application.
The simple litmus test for this concern is, is there anything this templating language can do that cannot be done with a completely external, independent textual templating system like Smarty, Jsonnet, ERB, etc. Does that make sense?
OK. It makes sense that the template language shouldn't need any special information. It should get everything it needs from its input parameters, which can come from environment variables, command line flags, input files, etc.
However, I don't see how that constraint is violated by the GCE template language. The deployment name and project ID are passed in as parameters through environment variables. The language itself doesn't have or require any special knowledge or special functions.
If you used your own templating language, your templates would have access to the same parameters from the same sources.
Is it a problem if the templating engine passes external information into the templates that it's expanding from other sources? For example, would it be reasonable for a templating engine to use a database or to call the api server to retrieve historical information, and to extract parameter values from it to pass into a template?
jsonnet, hjson, etc. (iirc terraform also has their own thing as well.) superset pseduo config langs?
I think that some of the confusion is when the template processing (and environment variables) are set up. Right now, I think we are all thinking about doing template evaluation "client" side. Or at least not built into the kube API server in a fundamental way. That means that if we do end up picking a "favorite" templating scheme, users can always substitute their own.
In the case where the template evaluation is happening server side there is the opportunity to inject information that wouldn't be available client side. I think that @ghodss is saying that this smells bad to him. I tend to agree -- I like the "isomorphic" ability to eval the templates either server-side or client-side.
Personally, I really like what I see in jsonnet. It'd be at the top of my list here. (It might be worth implementing a parser/evaluator in golang).
I'm not sure this meets all of Brian's criteria. However, I'm also not sure inventing something new here is the right way to go either. In any case, it isn't unreasonable to support a number of templating schemes. We can always swap something better in later.
Or at least not built into the kube API server in a fundamental way.
+1, Typical (expr/template) evaluations are done on client, scheduling, and in some cases node.
However, I'm also not sure inventing something new here is the right way to go either.
Sadly cluster management is typically a special case where non-turing-complete typically needs to be enforced if evaluation is to occur on any master component, otherwise there is the potential of a bad evaluation wedging a component. perhaps jsonnet-- or some time based gaurds on evaluation.
Discussion in Helm: https://github.com/deis/helm/issues/108
Openshift Templates: https://docs.openshift.org/latest/dev_guide/templates.html
Deployment Manager https://github.com/kubernetes/deployment-manager runs as a service in the cluster.
It not only expands templates there, but tracks the resulting deployments, so that the user can discover which resources were created by a given template.
DM also supports higher order types. A type is defined by a simple Python script plus an optional JSON schema.
When a higher order type is used in a deployment, DM captures the hierarchical relationship between the type and the resources created to instantiate it. The hierarchy can be arbitrarily deep, with one type (e.g., redis) using other types (e.g., replicatedservice, which is a template that expands into an RC and a service with coordinated names and labels).
After one or more higher order types have been deployed in a cluster, DM can report which types are present in the cluster, and can list the instances of a given type.
On Wed, Nov 4, 2015 at 7:53 PM, Brian Grant notifications@github.com wrote:
Discussion in Helm: deis/helm#108 https://github.com/deis/helm/issues/108
Openshift Templates: https://docs.openshift.org/latest/dev_guide/templates.html
— Reply to this email directly or view it on GitHub https://github.com/kubernetes/kubernetes/issues/11492#issuecomment-153948359 .
Special case: a service and corresponding access secret(s) and client config.
Thoughts for the upcoming SIG discussion...
The primary goal is that parameterization should facilitate reuse of declarative configuration templates in different environments in a "significant number" of common cases without further expansion, substitution, or other static preprocessing.
However, configuration parameterization is not the best solution to all configuration-related problems. For instance, DNS is a better solution than statically substituting the IP address. Imported services and/or service aliases could be an alternative to parameterizing names/addresses targeted by clients. Namespaces are a superior scoping mechanism to static name, label, and selector substitution. Smart defaults (e.g., #12298) are often a better way to streamline common use cases. ConfigData would be a better way to factor out application-level configuration from infrastructure-level configuration, much as Secret is a better way to factor out private authentication data. Overriding image defaults at deployment time is usually preferable to parameterizing a Dockerfile. Operational concerns would be better controlled via configurable policies (e.g., #17097) and/or automation (e.g., autoscalers, continuous deployment systems). (Exercise for the reader to notice the common patterns among those approaches.)
We also don't need a single, one-size-fits-all solution to parameterization, as mentioned in the Kubecon thread: https://groups.google.com/d/msgid/kubernetes-dev/CAH16ShKVtH6KBmT%3DZWdXmVpk-t1LeSBr_FjrQOEdMz%2BWfRhx6g%40mail.gmail.com
Other configuration parameterization requirements:
I'd like to keep configuration generation, such as that performed by run or expose or the Deployment Manager replicated service, distinct.
One nice-to-have item I missed from my previous list (https://github.com/kubernetes/kubernetes/pull/14918#issuecomment-153926666) is optional parameters with default values.
And there's also the issue of whether templates should be usable by clients other than kubectl: #12143
Certainly with Jsonnet, and probably in similar systems that do not have up-front declaration of variables, one can wrap the Jsonnet config in a layer (yaml, or bazel build rule, etc.) that declares the variables and provides default values. That layer is interpreted by the code that invokes the Jsonnet library (or commandline utility). This also allows standardization between different templating approaches, all of which have some notion of key/value string parameterization.
This functionality is easy to add with a wrapper (but impossible to remove with one) so I left it out of Jsonnet core to avoid introducing unnecessary complexity for cases that don't need it.
Concrete example (not k8s specific):
foozle.jsonnet:
{
foo: std.extVar("NUM_FOOZLES"),
bar: std.extVar("NUM_FOOZLES") + 1,
}
foozle_template.yaml
template_kind: Jsonnet
file: foozle.jsonnet
params:
NUM_FOOZLES: int
Use case:
template: foozle.yaml
args:
NUM_FOOZLES: 3
Post expansion:
{
foo: 3,
bar: 4,
}
I'd like to keep configuration generation, such as that performed by run or expose or the Deployment Manager replicated service, distinct.
The overall goal and many of the requirements listed above make this feature overlap heavily with Deployment Manager, and probably with other template driven solutions, as well.
Specifically, it is the goal of DM to
facilitate reuse of declarative configuration templates in different environments in a "significant number" of common cases without further expansion, substitution, or other static preprocessing.
It's entirely declarative and it expands templates recursively until there are no more templates left in the configuration.
The only requirement that doesn't overlap with DM's existing functionality is this one, and that's only because we haven't yet proposed to integrate the DM CLI w/kubectl:
Parameterization should work with all kubectl commands that accept --filename, and should work on templates comprised of multiple resources.
Some of the others are more problematic. Specifically:
Should not preclude the use of a different parameterization mechanism, it should be possible to use different mechanisms for different resources, and, ideally, the transformation should be composable with other substitution/decoration passes.
DM expects YAML w/Jinja markup and/or Python include files. Don't forsee anyone wanting to decorate the Python. Also, the Jinja should not be a problem if DM goes last. Jinja may choke on markup using the same notation injected by other mechanisms, if it's embedded in the Jinja includes.
Also, all of the following are duplicates of functionality provided by DM. May confuse users to have the same features expressed in different ways at different levels.
Specify template arguments (i.e., parameter values) declaratively, in a way that is "self-describing" (i.e., naming the parameters and the template to which they correspond). It should be possible to write generic commands to process templates.
Validate templates and template parameters, both values and the schema.
Validate and view the output of the substitution process.
Generate forms for parameterized templates, as discussed in #4210 and #6487.
Versioning and encapsulation should be encouraged, at least by convention.
Optional parameters with default values.
Lastly, DM templates can be treated as types, since the results of recursive template expansion are preserved, and can be inspected at any time by querying the cluster to discover the structure of the deployed application. If other mechanisms also presented a type abstraction, we could end up using multiple type systems in configurations.
A follow-up thought after the SIG discussion today when I am trying to weigh templates looking more like Kubernetes objects, or templates using something like Jinja, etc. Does that choice impact the ability to migrate templates across API versions in anyway? Using something more structured like OpenShift templates, it appears that it would simplify migration of a template from using v1 API objects to v2 API objects in an automated fashion via a simple, upgrade-templates style command. Using something like Jinja to encode a template seems that it would potentially make that process more difficult over time since the language is far more open ended. It may be a non-issue, but I raise it here because I think it impacts things around composition/recursion of templates themselves.
Good point @derekwaynecarr. I agree that convertibility is a desirable property. It would be problematic for any non-string template parameters given the way our conversion machinery currently works, but validation has the same issues.
I have been thinking about Server-side Template Expansion and Client-side Template Expansion as two very different use cases:
In this issue and related issues, it seems like people do not make this distinction.
Can someone explain?
@derekwaynecarr the problem with making templates look almost like Kubernetes objects is that it would limit customers to instantiating the built in primitive types (i.e., pod, rc, service, etc.).
With DM, we took a different approach... both primitive types and templates have the same simple set of properties: name
, type
, and parameters
. Instantiating a type is therefore brutally simple. For example, here's the config for Redis cluster:
- name: redis
type: github.com/kubernetes/application-dm-templates/storage/redis:v1
properties: null
The equivalent Kubernetes objects are much larger:
apiVersion: v1
kind: ReplicationController
metadata:
name: redis-master
labels:
app: redis
role: master
tier: backend
spec:
replicas: 1
template:
metadata:
labels:
app: redis
role: master
tier: backend
spec:
containers:
- name: master
image: redis
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- containerPort: 6379
---
apiVersion: v1
kind: Service
metadata:
name: redis-master
labels:
app: redis
role: master
tier: backend
spec:
ports:
- port: 6379
targetPort: 6379
selector:
app: redis
role: master
tier: backend
---
apiVersion: v1
kind: ReplicationController
metadata:
name: redis-slave
labels:
app: redis
role: slave
tier: backend
spec:
replicas: 2
template:
metadata:
labels:
app: redis
role: slave
tier: backend
spec:
containers:
- name: slave
image: gcr.io/google_samples/gb-redisslave:v1
resources:
requests:
cpu: 100m
memory: 100Mi
env:
- name: GET_HOSTS_FROM
value: dns
ports:
- containerPort: 6379
---
apiVersion: v1
kind: Service
metadata:
name: redis-slave
labels:
app: redis
role: slave
tier: backend
spec:
ports:
# the port that this service should serve on
- port: 6379
selector:
app: redis
role: slave
tier: backend
As for automatic migration, the only templates that need to change across API versions are the ones that deal with primitive types. These are only a handful of templates, and a small fraction of the set of application templates that customers will develop.
It's also critical, in our view, to make versioning a first class construct in templates. For that reason, DM templates are versioned, as you can see in the reference above to github.com/kubernetes/application-dm-templates/storage/redis:v1
. You can read more about templates versions here.
@erictune the problem with limiting rich template expansion to the client is that knowledge of what templates were used and how they expanded is lost when the resulting vanilla Kubernetes objects are deployed. By contrast, DM does rich server side expansion, and captures the resulting topology as metadata in the cluster that can be queried programmatically for things like visualization and signals aggregation. You can read more about this metadata, which we call layout, here.
Terraform expands templates (modules) on the client and preserves that structure in remote state.
So, yes, that's possible. However, it's not optimal, since the state must be marshalled, sent to the server, unmarshalled and stored, after the fact. All additional points of failure.
That said, there's another reason not to expand templates on the client. Client side expansion puts requirements on the client. Specifically, the client needs to be configured with whatever expansion technology the templates require. To your earlier point about supporting a plethora of languages, we now have a client configuration problem.
Also, expanding on the client puts the client at risk if the template expansion contains computation of any kind. On the server, where expansion will run in a container, the risk can be mitigated by reducing the privileges of the expansion container.
Bottom line, putting expansion on the server has multiple benefits, including less intrusion on the client.
On convertability:
Let's assume there is a procedural way to convert resources from one API to another. (If this is not the case, there is no way forward).
In any template expansion language in which one can write that conversion function, it is easy to convert a config p(args) by composing it with the conversion function as so: convert(p(args)). That covers Jsonnet, Python, Nix, Flabbergast, Lua, etc. This does not result in beautiful code, but it does result in code that works. For beautiful code, you then have to refactor convert(p(args)) to include the conversion function inside p. In general, symbolic execution is hard, but if the transformation is simple, it may be tractable. For example simply removing a key of a resource is an instance of live variable analysis, which every compiler does and is a solved problem. However if you just need a config that works, and not a beautiful config, none of this matters.
For languages that limit computation to string concatenation and variable lookup within object values, it should be possible to transform them automatically, as long as the needed transformations are restricted to structure and key names, and not changing the values. E.g. a change from a given host:port pair, to separate keys for host and port, would not be possible because you need to split on the : to achieve that. However, the converse would be possible because one would just concatenate the two values either side of the colon. The feasibility of automatic conversion therefore depends on what constructs for data manipulation are available, and what kind of transformation is needed. If the language allows computing keys, this becomes much harder. For a pathological case, consider a config where every key and every value is specified with its own parameter.
For languages that expand plain text with no understanding of the underlying structure (like Jinja) it is not tractable to automatically convert to another config that generates a different shape of resource. These languages are really intended for XML where you have a lot of human-readable text intermingled with the structure. They typically don't have features to transform text blocks except as atomic units, and even if they did, these text blocks would be difficult to parse as they would not in general be valid syntax. Even doing this manually may require high-level thinking.
Conclusion: If you want to do non-trivial automatic conversions, you're much better off writing configs in a proper DSL or general purpose scripting language. Even then, it would likely be an addendum to the config rather than a true refactoring. Simpler languages do not necessarily make this easier, and structureless languages are a dead end.
Responding to Jack: I don't buy that marshalling / storing is a problem since you need to store anyway with server-side execution, and storing requires serialization. I buy the rest though.
If running in a container is sufficient for the security question, then that's very good.
There has to be a story for debugging server side execution when it fails (and sandboxing creates even more failure modes). That does not mean we have to do it client side generally, but it would be useful to have a client side tool that does the same thing the server does, and additionally allows debug output, attaching a debugger, or whatever. One answer to this might just be to run a local master?
@erictune - I agree with your distinction, and I view it with a similar perspective.
@jackgr - brevity is not the same as easily understood. I have worked with Kubernetes for the last year, and I get confused when I initially see your template because it obfuscates, and promotes a different syntax pattern than other API objects. I hope that we promote looking at templates as a way to easily learn the system and not just learn the template language.
If a template looked more like other Kube objects, and supported nesting via an ObjectReference
is there no difference?
Then your pattern of doing the following to embed redis:
- name: redis
type: github.com/kubernetes/application-dm-templates/storage/redis:v1
properties: null
is no different than
- name: redis
namespace: kube-templates
kind: template
apiVersion: v1
resourceVersion: 123
I am not actually convinced on the merit of nesting, but I don't think nesting is a prerequisite to using another syntax.
/cc @bparees
@derekwaynecarr nice example.
However, it requires an api object of kind template in the kube-templates namespace. That seems to imply anointing a specific templating technology, and more importantly a specific markup format, as the one supported by the platform. This, to me, seems the biggest sticking point.
That said, perhaps an approach is possible where templating engines are registered and invoked as needed to expand specific pieces of content.
At the end of the day, I think it comes down to a type registration problem. It's easy enough to format the input to a template as an API object, as you've demonstrated. The hard part is binding the object types to templating engines.
Just thinking out loud here, namespaces provide a nice way to partition types. What if we added just enough to the api server to make it smart about delegating object expansion to a registered templating engine based on namespace?
For example, let's say we had an API to register a service as an object processor, something like this:
- name: rhprocessor
kind: ObjectProcessor
apiVersion: v1
spec:
service:
namespace: redhat
name: rhprocessor
or this:
- name: dmexpander
kind: ObjectProcessor
apiVersion: v1
spec:
service:
namespace: dm
name: dmexpander
Then, we let callers supply objects that can be processed by the registered processors, like this:
- name: my-redis
kind: redhat/redis
apiVersion: v2.1
spec:
port: 26379
sentinels: 2
or this:
- name: my-redis
kind: dm/redis
apiVersion: v1.2
spec:
slaves: 2
Metadata supplied to the object processor when it's registered could furthermore be used to help it resolve template references without requiring the templates themselves to be registered. For example:
- name: dmexpander
kind: ObjectProcessor
apiVersion: v1
spec:
service:
namespace: dm
name: dmexpander
metadata:
repository: github.com/kubernetes/application-dm-templates
or this:
- name: helmserver
kind: ObjectProcessor
apiVersion: v1
spec:
service:
namespace: deis
name: helmserver
metadata:
repository: github.com/helm/charts
This proposal is somewhat analogous to the client side stream processor proposal from @brendandburns (#14993), but for the server side.
@jackgr Arbitrary transformations are out of scope for this issue. This is about template parameterization.
@bgrant0607 don't see why this mechanism can't be used for template parameterization. Also, don't see why we would want to constrain ourselves to that.
This proposal is intended to be analogous to #14993, which deals with stream transformation in a general way. That approach seems like a reasonable way to achieve both template parameterization and template expansion.
The primary difference between the two, from the object processor's POV, is that template parameterization doesn't recurse, but template expansion can. From the caller's POV, the primary difference is that template expansion will never return more than one output object from a given input object, while template expansion can return a list.
Since we want lists to be first class citizens, what's not to like about allowing template expansion, instead of just template parameterization?
A parameterization solution from the community: http://code.haleby.se/2015/11/20/simple-templating-engine-in-bash/
Another community solution (written by me and in progress), based on the proposal from #18215: https://github.com/InQuicker/ktmpl
@jimmycuadra Would you like to present to the configuration SIG? If so, please join https://groups.google.com/forum/#!forum/kubernetes-sig-config and add an item to the Agenda document. Thanks.
Any updates?
Why not move towards a programming language and rename it as a config language? For example, how groovy is for gradle, Clojure is for Riemann.io and so on.
@kant111 Almost all scripting languages have heavyweight implementations, and are non-hermetic (can't be relied on for immutable infrastructure). They are also usually designed for specifying behavior instead of data, therefore emphasize the wrong features resulting in verbose configs. One nasty thing with imperative languages is that you can't substitute code with the generated data (because you don't what else it might be changing). On the other hand referential transparency is a perfect fit for config because it's about defining data, and never about doing things.
On the other hand there's little to be gained by adopting a standard scripting language, because the standard tooling often does not make as much sense in a config context. E.g. Bazel's Skylark has its own syntax highlighters, linter, reformatter, and interpreter even though it's a subset of Python.
So while it may be tempting and gets you over the hurdle in the short term, in the long term the cost benefit ratio just doesn't work out, especially when there are better approaches available that meet a large range of needs.
Ref #23896, #25293, #25622
Closing this issue in favor of #23896, which is the direction we've decided to go.
Forked from #1743
One of the critical features we're lacking on the configuration front is template parameterization. We should decide on what approach we want to take.
Working list of requirements: