Allow .tmpl suffix for chezmoidata

Problem

There should be a way to dynamically generate chezmoidata for use in templates. I will use the simple example outlined in this section for the rest of this proposal.

Suppose the chezmoi.toml contains:

# chezmoi.toml
[data]
fontSize = 12

Based on this font size, we wish to automatically produce some other font sizes of fontSizeSmall at 75% of fontSize and fontSizeLarge at 150% of fontSize. Currently this cannot be done in a dynamic way without the advanced use of templates, template variables, and scripts (see existing solutions below for how this can be achieved under the current 2.46 version).

Proposed Solution

Files in .chezmoidata should be treated as templates if they opt-in by having the .tmpl suffix. This would allow the following to be created in the .chezmoidata directory:

# .chezmoidata/fontsize.toml.tmpl
[data]
fontSizeSmall = {{ int (round (mulf 0.75 .fontSize) 0) }}
fontSizeLarge = {{ int (round (mulf 1.50 .fontSize) 0) }}

Now .fontSize, .fontSizeSmall, and .fontSizeLarge can all be used in templates, for example:

font:
  small:   {{ .fontSizeSmall }}
  regular: {{ .fontSize }}
  large:   {{ .fontSizeLarge }}

Existing Solution 1: Generate data files

This is how I currently solve the problem:

Use a script such as .chezmoiscripts/run_before_fontSize.sh to generate a file such as fontSize.json which contains keys for small and large.
Put {{- $fontSize := include "fontSize.json" | mustFromJson -}} at the top of any template files where I want to use this data.
Access the variables through $fontSize.

For example, the following can then be done:

{{- $fontSize := include "fontSize.json" | mustFromJson -}}

font:
  small:   {{ $fontSize.small }}
  regular: {{ .fontSize }}
  large:   {{ $fontSize.large }}

This has drawbacks:

There are two separate syntaxes: one for “real” variables and one for derived variables.
It requires a separate script which can be avoided for simple cases such as this font size example that can be computed entirely with templates.
It requires the creation of an intermediate file.
It requires “importing” the generated data into a template variable.

Existing Solution 2: Use chezmoitemplates

There are multiple ways this can be done using chezmoitemplates; I will show the one below that I feel is the cleanest.

Create two files, .chezmoitemplates/fontSizeSmall and .chezmoitemplates/fontSizeLarge, with the following contents:

# .chezmoitemplates/fontSizeSmall
{{- int (round (mulf 0.75 .fontSize) 0) -}}

# .chezmoitemplates/fontSizeLarge
{{- int (round (mulf 1.50 .fontSize) 0) -}}

Now these can be used to get the small and large font sizes, for example:

font:
  small:   {{ template "fontSizeSmall" . }}
  regular: {{ .fontSize }}
  large:   {{ template "fontSizeLarge" . }}

This has drawbacks:

There are two separate syntaxes: one for “real” variables and one for derived variables.
This is not scalable and requires one file per variable.

The final drawback can be overcome using a hybrid of Existing Solutions 1 and 2 where a chezmoitemplate is used to generate a large amount of data in a parseable format (e.g. JSON) and then loaded into a template variable {{- $fontSize := template "fontSize" . | mustFromJson -}}, but now this solution includes several drawbacks from Existing Solution 1.

Discussion

The proposed solution is clearly the most ergonomic since it allows for the dynamic creation of variables using the existing and powerful template system and enables a unified syntax for all variables. The only change required is to allow the .tmpl suffix to be placed on chezmoidata files to opt-in to using the template system for dynamic variable generation. Since the suffix is opt-in, this should not break existing configurations. I do not believe there is a technical limitation in implementing this based on my understanding of how Chezmoi loads chezmoidata as there is a well defined loading order.

Thank you for this proposal. This proposal has a circular dependency.

Consider the two following files:

# ~/.local/share/chezmoi/.chezmoidata/a.yaml.tmpl
a: {{ .b }}

# ~/.local/share/chezmoi/.chezmoidata/b.yaml.tmpl
b: {{ .a }}

What should the resulting template data be?

More generally, .chezmoidata.$FORMAT contains data that is available to templates. Therefore, .chezmoidata.$FORMAT must be read before any templates are executed.

When .chezmoidata.$FORMAT.tmpl is executed it needs all template data to execute, and the result of executing the template is itself template data, needed for template execution.

Stepping back and looking at the fundamental problem that you want to solve, i.e. having font size vary from machine to machine, I would instead set data.fontSize = 12 in my configuration file and copy-and-paste {{ int (round (mulf 0.75 .fontSize) 0) }} into my templates that need the small font size.

If I had a lot of templates that all needed the small font size, then I would also set data.smallFontScale = 0.75 in my config file and copy-and-paste {{ int (round (mulf .smallFontScale .fontSize) 0) }} into my templates.

In extreme cases, I would use a config file template containing:

{{ $fontSize := promptIntOnce . "fontSize" "font size" 12 }}
[data]
    fontSize = {{ $fontSize }}
    fontSizeSmall = {{ int (round (mulf 0.75 $fontSize) 0) }}
    fontSizeLarge = {{ int (round (mulf 1.5 $fontSize) 0) }}

and then use .fontSize, .fontSizeSmall, and .fontSizeLarge in my templates.

Consider the two following files:
# ~/.local/share/chezmoi/.chezmoidata/a.yaml.tmpl
a: {{ .b }}
# ~/.local/share/chezmoi/.chezmoidata/b.yaml.tmpl
b: {{ .a }}
What should the resulting template data be?

Maybe this is my lack of understanding the chezmoidata loading process, but I assumed it would be similar to how scripts are run in lexicographical order. So in this particular example, a.yaml.tmpl would be processed first and then b.yaml.tmpl. This would given an error when processing a.yaml.tmpl since the key .b is not yet defined. It is obvious that there is a circular dependency problem if all of these chezmoidata templates are processed simultaneously or in an arbitrary order, but if they are processed lexicographically like scripts, then there is a well-defined and non-circular order that is straightforward to reason about.

I appreciate the suggestions and they would be useful for someone having this problem with less data. The font size example was something simple I came up with to illustrate the problem. I actually have a lot more data that I generate and most of it is too complicated for the template system itself. For example, I generate color themes on the fly in several different formats that different applications expect and this results in a lot of data. Most of the things I want to put in templates in chezmoidata would look like this:

{{- output "python3" "script-that-generates-data-in-json-format.py" -}}

where the file in chezmoidata would simply contain the output from the script that does work that is too complex for templates to do directly. Since I figured that there wouldn't be a problem adding templates to chezmoidata (based on the reasoning earlier in this comment), I thought that this might be a useful feature.

I appreciate the suggestions and they would be useful for someone having this problem with less data. The font size example was something simple I came up with to illustrate the problem. I actually have a lot more data that I generate and most of it is too complicated for the template system itself. For example, I generate color themes on the fly in several different formats that different applications expect and this results in a lot of data. Most of the things I want to put in templates in chezmoidata would look like this:
{{- output "python3" "script-that-generates-data-in-json-format.py" -}}
where the file in chezmoidata would simply contain the output from the script that does work that is too complex for templates to do directly. Since I figured that there wouldn't be a problem adding templates to chezmoidata (based on the reasoning earlier in this comment), I thought that this might be a useful feature.

I may be misremembering, but I believe that includeTemplate results are cached, so you can have a template that does {{ - output "python3" "script-that-generates-data-in-json-format.py" -}} in .chezmoitemplates/scripted.tmpl, and {{- $scripted := includeTemplate "scripted.tmpl" | fromJson -}} in your targets and access the JSON data from there, but the script will only be executed once. See my .chezmoitemplates/programs.tmpl and a file that uses it for an example that I build up within the template itself.

@halostatue that is a good point and is a performance improvement to the "Existing Solution 1" in my original comment. Unfortunately, it does not really fix what I am after which is a matter of ergonomics. That still requires "importing" the data and the use of two different syntaxes (e.g. .name.of.data for chezmoidata and $name.of.data for template variables loaded in this way).

By allowing .tmpl files in .chezmoidata/, all data is unified under a single namespace, which is .. This is very nice when you want to pass all data to other templates as outlined in https://www.chezmoi.io/user-guide/templating/#passing-multiple-arguments and greatly simplifies cases where you want all the data to be available with {{- template "some-template" . -}}. It also lets you factor out common decisions such as using different data on different operating systems so you can do the operating system check once in chezmoi data instead of in each file that needs it. That would keep configurations cleaner and prevent accidentally forgetting to update a condition in a file. These are just some of the reasons why this feature would be useful.

I believe that what I am proposing should work without issue. Perhaps @twpayne could comment whether this is a technical limitation and chezmoi does not work how I think it does or that my understanding of the chezmoidata loading order is correct and this is feasible. If it is the former then that is likely the end of this idea, but if it is the latter then this in theory could be implemented and I think it would be a powerful feature.

I am not speaking for @twpayne on this, but I mostly agree with him on the advisability of this.

Technically, I believe it is possible to do what you suggest (and I suggested something similar in a comment on https://github.com/twpayne/chezmoi/discussions/2673#discussioncomment-5117068); at the bottom of that, I linked to this issue as a similar request).

Practically, however, I believe that it will be problematic from a support perspective. As I understand it, .chezmoidata.$FORMAT or .chezmoidata/* can appear anywhere in the chezmoi root (this code search suggests that I am correct). .chezmoidata in the root affects the entire run, but .chezmoidata in a subdirectory only affects that subdirectory. Someone is bound to have .chezmoidata.toml.tmpl that ends up trying to reference something defined in config/.chezmoidata.toml.tmpl and it will fail with a template variable reference error.

I think that there may be a possible alternative approach for this. This is a bit of a riff without a lot of deep consideration, so there are likely to be a bunch of flaws in this suggestion. Let’s add a new file: .chezmoidatascript.$FORMAT. For the moment, we only support that file, and do not plan to support datascript files in .chezmoidata/. (If we did, maybe $filename.datascript.$FORMAT, but that feels "too magic", which is why I reject it.) The contents of .chezmoidatascript.$FORMAT must be a simple command-line that chezmoi can execute. Let's say that we have .chezmoidatascript.json defined with the contents:

python3 script-that-generates-data-in-json-format.py

The script accepts no input or arguments not in the command-line, but runs in the chezmoi context (so ${CHEZMOI_*} variables are available), and it must output the format expected. For simplicity, chezmoi can read the first line of the file and ignore anything else. That is, we do not support continuation characters:

python3 \
  script-that-generates-data-in-json-format.py

The above would execute the equivalent of python3 \\.

If the script returns a non-zero return code, or returns anything on stderr, or returns data in any format other than the format expected (that is, parsing of stdout fails), chezmoi simply panics.

Maybe this is my lack of understanding the chezmoidata loading process, but I assumed it would be similar to how scripts are run in lexicographical order.

I don't want to introduce any new dependency order on .chezmoidata files. In the future I would like to make chezmoi's execution engine more concurrent for performance reasons, and dependency orders force serialization and prevent concurrency.

If you want to run a Python script that generates .chezmoidata files then you can use a read-source-state hook.

So, thanks for the suggestion, but given that there are multiple good workarounds available, this will not be worked on.

This is sensible and it is clear that the idea complicates the future plans for chezmoi. @twpayne and @halostatue thank you both for the feedback and information.

twpayne / chezmoi