Open bgrant0607 opened 2 years ago
One thing that Craig Box got me noodling about ... does yaml matter to kpt / to kubernetes? It clearly doesn't really matter; it's just a representation that we've decided upon.
Craig (jokingly?) suggested INI files as an alternative to yaml, and perhaps that is the path here. When we write configuration in INI or toml, we are actually setting values in a configuration object. That configuration object doesn't allow all keys, and has various restrictions on the values of those keys. In other words, even though we're writing in a different "expression" language, we could imagine writing an OpenAPI spec to describe the schema of the configuration.
This suggests we could think about writing a set of transformation functions from instances of CRDs to the various common configuration file formats. By doing so, we bring legacy configuration into the better-structured world of kubernetes and KRM.
We could do so either as a client-side object or as a true CRD with an operator.
This doesn't obviously solve #3119, so I'd imagine we would start client-side.
An example of an application with lots of configuration is kafka: https://github.com/bitnami/charts/blob/master/bitnami/kafka/values.yaml#L93 https://github.com/mesosphere/dcos-kafka-service/blob/master/frameworks/kafka/universe/config.json
Similar to the overall approach to WYSIWYG configuration, I wouldn't want to abstract the application configuration. For instance, as a user or developer I'd expect it to match what I saw in the code or development environment or documentation: https://kafka.apache.org/documentation/#configuration
So, yes, some apps would express configuration in INI or TOML.
This is where something like Augeas is interesting. "Augeas is a configuration editing tool. It parses configuration files in their native formats and transforms them into a tree." Looking at http://augeas.net/docs/augeas.pdf, the idea sounds very close to what we would want. Like a pluggable source/sink for specific non-KRM file types.
With #3118, we wouldn't technically need a custom source/sink. We'd still need custom parsing, marshaling, and visualization, though.
As a concrete example that would address a segment of applications, we looked at Spring Boot config (application.properties) in the early days of the kpt project, but it looks like the demo video recordings don't exist any more. This post discusses it: https://www.springboottutorial.com/spring-boot-application-configuration
One specific category of application configuration is resource-dependent configuration: VM heap size, thread pool sizes, simultaneous connections, cache sizes, etc. Network- and disk-intensive applications often have a number of these tunable settings.
A number of legacy applications and even language runtimes are not container-aware. As an example, before Java was container-aware, additional automation was necessary that is not in more recent versions of the JDK.
Ideally these settings would be derived from container resource limits, either at run time, such as using an init container, or an application-specific function, which would be lighter weight than either an Operator or admission controller.
cc @johnbelamaric
I like an app-specific function, especially if is written in something like Starlark that does not require building and maintaining and coordinating versioning for a separate container image. An init container or custom Go function would require that.
This video discusses an in-pod templating approach using init containers, which is a variation on the entrypoint.sh script approach: https://youtu.be/eJmNSYvelSw?t=1087
I'm liking the Augeas idea, though. If we could convert lots of config formats to a canonical form in kpt fn source, we could manipulate the canonical form and write it back using kpt fn sink.
https://osquery.io/ apparently integrates with Augeas. https://www.uptycs.com/blog/using-augeas-with-osquery-how-to-access-configuration-files-from-hundreds-of-applications
That's read-only, for queries.
Puppet integrates it also, for setting values: https://puppet.com/docs/puppet/5.5/resources_augeas.html
And there's Go integration: https://dev.to/raphink/configuration-surgery-with-go-structure-tags-12a4
More examples: https://ghost.org/docs/config/ https://dev.mysql.com/doc/refman/8.0/en/server-configuration-defaults.html https://www.postgresql.org/docs/current/config-setting.html#CONFIG-SETTING-CONFIGURATION-FILE https://www.rabbitmq.com/configure.html https://redis.io/docs/manual/config/ https://www.nginx.com/resources/wiki/start/topics/examples/full/ https://prometheus.io/docs/prometheus/latest/configuration/configuration/ https://etcd.io/docs/v3.4/op-guide/configuration/ https://www.vaultproject.io/docs/configuration https://airflow.apache.org/docs/apache-airflow/stable/configurations-ref.html https://wpmudev.com/blog/wordpress-wp-config-file-guide/ (php may be too hard) https://www.drupal.org/docs/configuration-management/managing-your-sites-configuration
We can look through charts for more examples: https://github.com/bitnami/charts/tree/master/bitnami
Some discusson in our Kpt office hours
List of formats it looks like we need support for:
This is not a lot of formats. They all have Go implementations with permissive open-source licenses, though they may not preserve comments and whitespace.
I like what Augeas has done, but most of the 300 formats it supports are for system files, which we don't need, so it would probably be easiest for us to develop our own implementation and canonical representation. We would want the mechanism to be similarly pluggable.
We will want to be able to infer the format, such as from file extension and/or trying to parse the file, with a fallback for the user to be able to specify the format.
Because https://github.com/kubernetes/kubernetes/issues/831 was never done, the configuration needs to be in a ConfigMap in order for it to be injected into the application in a straightforward manner.
Options we discussed today for how to represent app config:
The advantage of the application's native format as the source of truth (option 1 or 3) is easier compatibility with the existing application ecosystem(s), without frequent format migrations: reference documentation, tutorials, samples, generators, editors, IDE plugins, Augeas plugins, etc. For instance, here's a mariadb config I could copy/paste: https://www.ibm.com/docs/en/ztpf/1.1.0.15?topic=collection-mariadb-configuration-file-example
I personally don't have a problem with option 3, but it would be useful to get feedback from actual users.
For all the options, our tooling would manipulate our canonical representation.
The problem of a lack of a schema exists for all the options. We'd design the schema to match our canonical format regardless of which option we picked.
Option 1 requires more conversions back and forth by kpt. Option 2 requires more conversions back and forth by the user. Option 3 is the simplest and most flexible, but possibly harder to understand.
An example of toml embedded in helm chart values: https://github.com/influxdata/helm-charts/blob/master/charts/telegraf/templates/configmap.yaml https://github.com/influxdata/telegraf/tree/master/plugins/ and one opinion on that experience: https://youtu.be/LBCmMTofNxw?t=1937
I suspect all three will be needed, but from a preferred order, I find Option 3 more aligned with the vision, for a couple reasons:
For Option 3, we can make it more palatable with a convention to identify the generated ConfigMaps. We have also discussed management of historical ConfigMaps so this fits in pretty well with that concept. For example, a particular annotation or even storing them in a special directory. A couple other considerations, that perhaps should be discussed on #3119 are: 1) how to combine multiple non-KRM files into a single ConfigMap; 2) how to name, annotate, label, etc. the ConfigMap. I am imagining a "stub" ConfigMap such that functions take in that CM, the raw file resource, and a key name.
I think what @yuwenma demoed was essentially option 2: represent the configuration in a canonical KRM format in the package. But instead of adapting the format in the apply step, used a ConfigMap with granular key-value pairs as the canonical format and added an init container to convert that to INI for the application.
Agreed. What I missed in your description of 3 above was that we would store in the canonical format - I was reading it as representing the native format and the generated ConfigMap(s), treating the canonical format as an intermediate in-memory representation. So we actually have three different formats: native, canonical, and generated ConfigMap. Which expands the options a bit, as to storing which subset of these three.
The other point we need to consider is the source of truth. Clearly the generated ConfigMaps are not it. So it leaves the native and canonical formats. If we store the canonical format, then we will have some confusion as to which is SoT.
Another way to think about SoT is to make it an opinionated pipeline of overrides. The native format - the one most easily edited by humans - is the input to the pipeline, which then may override values in that input to produce the final ConfigMap. This works pretty well for the simple case of an independent file and is straightforward: I edit the native file, but my fn render
pipeline may tweak it further and rewrite the file. If we store the canonical format too, I think it muddies these waters.
This method doesn't preclude us being smart about the updates to the native files by internally parsing them to the canonical format, nor does it preclude us using that canonical format to present edits in the UI. Those updates and UI-based edits are subject to being overridden by the pipeline, of course.
It gets tricky when we have inputs that are interrelated between the config file and other resources, though. For example, if we change the port in the native file, does that propagate through the the Service port? Or vice-versa? While the "input with pipeline overrides" doesn't solve this problem, I think that's OK. This is actually the same problem we have for any other resources wrt SoT; the input just happens to be in a different format.
Ooh, I like the idea of storing generated objects in a subdirectory. That might be a useful pattern for generators more generally, especially in the case that post-generation edits aren't feasible: #2528.
Something I proposed in slack: Any applications that can specify config via environment variables should probably do so for now. The ConfigMap with granular key-value pairs could serve as the canonical format. Though it's not quite the native env file format that could be sourced by the shell (added to list above), it should be familiar to Kubernetes users.
Regarding 3 formats: fair point.
This PR has an example possible canonical format using granular, flattened key-value pairs (similar to Augeas's internal format) in a ConfigMap: https://github.com/GoogleContainerTools/kpt-samples/pull/11/files
The python program that converts the corresponding env vars to the app's native INI format, which runs as an init container, is in that PR also. Presumably there's also a program that converts INI to the canonical format. Here there are only 2 formats because the canonical format is fed directly to the init container as opposed to generating a ConfigMap with an embedded INI file.
One big advantage for option 3: Once users accept the idea of using canonical format to represent their non KRM app config, they can build logic between the non KRM files and their k8s resources directly and this will give them more flexibility to mutate and validate the package as a whole.
For example, by writing a simple KRM validator function, the platform developer can guarantee the MariaDB port number in INI file is the same as the Ghost deployment database port number. Right now, the most feasible way to do this is to use multi-line setters (not sure if it still works or not), which is the opposite of what we want.
I really like the summary that "Option 1 requires more conversions back and forth by kpt. Option 2 requires more conversions back and forth by the user. Option 3 is the simplest and most flexible, but possibly harder to understand.". For now, I'm leaning towards Option 1 because it gives the best user experience to get started. Only they do, we can "get feedback from actual users."
There's also still the configmap rollout issue. https://github.com/kubernetes/kubernetes/issues/22368
This is related to #3119, but deserves its own issue.
In application-related resources, application configuration often constitutes a large proportion of the overall configuration size.
Application configuration is special in multiple ways:
Command-line flags are evil, so I'll punt on them for now, other than using env var substitution to define their values.
Env vars are about the best case. Kustomize has support for generating ConfigMaps from env files, and Kubernetes can inject them as envvars. And, if represented natively in a ConfigMap or in a pod template, then they are KRM and could be edited as such. There's still no native schema though (https://github.com/kubernetes/kubernetes/issues/4210). A command for editing env vars would also be nice.
I haven't looked for any kind of data, but presumably there are some relatively common file formats, such as INI, TOML, Spring Boot properties, etc.
A common, rational instinct is to normalize such formats into a universal, simpler structured form, generally a simple map or nested map. The most common approach is templating and template parameters, with all the consequences that implies. It's less terrible than other uses of templating if one views config files of unknown formats as just unstructured text, but does feel suboptimal. For instance, anyone familiar with how an application is configured would then need to learn the new representation and how it maps to the application-native one, since often syntax, capitalization, etc. are different. It also frequently requires insertion of conditional logic to handle present / not present of the properties. Some formats, such as JSON, are particularly challenging to ensure the output is valid.
For a variety of reasons, we rejected several proposals to support templating in Kubernetes itself (e.g., https://github.com/kubernetes/kubernetes/issues/30716, https://github.com/kubernetes/kubernetes/issues/89738, https://github.com/kubernetes/kubernetes/issues/96346).
We investigated this issue some when we were designing ConfigMap (https://github.com/kubernetes/kubernetes/issues/1553, https://github.com/kubernetes/kubernetes/issues/2068).
I wonder if we could do something with http://augeas.net/index.html "Augeas is a configuration editing tool. It parses configuration files in their native formats and transforms them into a tree. Configuration changes are made by manipulating this tree and saving it back into native config files."
We would like to provide a similar WYSIWYG transformation and editing experience for application configuration as for KRM resources, at least for a subset of common formats. We could even recommend an automation-friendly format for people writing their own applications.
This affects ~all the functionality of kpt: update merging, diffs, source and sink, function SDKs, the UI.
For example, we also need to be able to do granular merging during updates, in the original non-KRM config file, and the ensure any ConfigMaps they are embedded into are updated (#3119).