Make Jinja templates less cryptic

xhernandez commented 7 months ago

The current Jinja templates used by sit-environment, specially the one to transform settings.yml into config.yml, are becoming too much cryptic due to the limitations of the templating language itself (actually the language is not designed to do complex transformations).

In Ansible it's possible to implement new components in python that extend the capabilities of the templating language. One kind of extension is called "filter", which is specially useful to implement data transformations. However I don't want to transform all the logic into a python program. This could easily lead to python code that needs to be modified constantly to adapt to new changes. The Ansible playbooks should remain as the core components, containing the main logic of the installation. For this reason, I propose to create some generic new filters that could be reused in several places, simplifying the templates but not extracting the logic from them.

I will add the definition of some of the main filters as an example of how it could work.

The transform filter

This filter will take some data (normally a dict or a list) and transform it into another thing using a structure explicitly defined as its first argument. The structure is normally a dict or a list where the values can be Jinja expressions which can use information from the original data to create the transformed one.

{{
    data | transform({
        ...
    })
}}

Definition of dicts

The generic structure of a dict transform is the following:

{{
    data | transform({
        "key1": value1,
        "key2": value2,
        ...
    })
}}

But the keys (key1 and key2 in this case) can be:

String The string identifies the name of the key that will be added to the transformed result. If the original data was also a dict and it contained the same key, that part of the data will be recursively transformed using the structure defined as the value (value1 and value2 in this case).
Jinja expression If the key contains a Jinja expression, it's evaluated and the result can be a string, a list or a dict.
- String It will be used as the key of the transformed dict.
- List of strings The same structure defined as the value will be used to transform each of the keys in the list.
- Dict The same structure defined as the value will be used to transform each of the keys in the dict (values of the dict are ignored. This is just for convenience).

The values can be anything, including nested dicts, lists or Jinja expressions to recursively define the transformation. The type of the transformation will be the same as the type of the value (i.e. a dict will return a dict, and a list will return a list). The only exception is a text value containing a Jinja expression. In this case, the type of the result depends on the result of the evaluation of the Jinja expression, which can be a string, a dict or a list.

Special variables inside transform

In the keys and values it's possible to use special variables inside Jinja expressions to reference the original data or previously processed data, as well as some context information that can be useful to do the transformations.

The variables are:

this Contains the data that comes from the original data that corresponds to the part of the transformation, or None if there's no correspondence.
obj Is a reference to the latest defined object (the one being created). This can be used to access the original corresponding data (if it also was an object) or the already modified fields.
parent Is a reference to the parent object of the current one.
parents It's a list of parents. parents[0] is equivalent to parent. It can be used to easily reference any parent object.

Example

Suppose we have this data:

settings:
  os:
    centos8:
      family: redhat
      distro: centos
      version: 8
    centos9:
      family: redhat
      distro: centos
      version: 9

Then, this transformation:

config:
{{
    settings | transform({
        "os": {
            "{{ this }}": {
                "includes": [
                    '{{ obj.distro }}{{ obj.version }}.yml',
                    '{{ obj.distro }}.yml',
                    '{{ obj.family }}.yml'
                ]
            }
        }
    })
}}

Will return this:

config:
  os:
    centos8:
      includes:
        - centos8.yml
        - centos.yml
        - redhat.yml
    centos9:
      includes:
        - centos9.yml
        - centos.yml
        - redhat.yml

In this case, this inside the key points to settings.os, so it returns the keys centos8 and centos9. Then, inside the includes list, obj points to the corresponding settings.os.centos8 and settings.os.centos9.

Note that only explicitly defined keys are present in the transformed result. Inside os we have both keys because we used this to reference all the original data, but inside the os.centos8 and os.centos9, there's only an includes key, which is the one that was explicitly referenced (even if it didn't exist in the original data). To keep the remaining original data, additional filters need to be used (see below).

The merge filter

This filter, applied to a dict or list, causes the elements from the original data that have not been explicitly referenced to be copied to the result. This filter can accept a parameter to specify a method for merging, specially on lists (like "add_after", "add_before", "replace", ...).

Example

This transformation:

config:
{{
    settings | transform({
        "os": {
            "{{ this }}": {
                "includes": [
                    '{{ obj.distro }}{{ obj.version }}.yml',
                    '{{ obj.distro }}.yml',
                    '{{ obj.family }}.yml'
                ]
            } | merge
        }
    })
}}

Will return this:

config:
  os:
    centos8:
      family: redhat
      distro: centos
      version: 8
      includes:
        - centos8.yml
        - centos.yml
        - redhat.yml
    centos9:
      family: redhat
      distro: centos
      version: 9
      includes:
        - centos9.yml
        - centos.yml
        - redhat.yml

The instantiate filter

This filter replicates a single data element into many almost identical copies using information defined in the source data itself. It requires that the source object contains an instances field with some data that determines how many instances will be created and how it's name is composed.

Example

Suppose we have this data:

settings:
  accounts:
    default:
      groups:
        test:
          gid: 10001
          instances:
            count: 3
            base: 1
        demo:
          gid: 20001
          instances:
            count: 2
            base: 0
      users:
        test:
          uid: 1001
          password: x
          groups: ['test']
          instances:
            count: 2
            base: 1

Then, this transformation:

accounts:
{{
    settings.accounts | transform({
        "{{ this }}": {
            "groups": {
                "{{ this }}": {
                    "gid": "{{ this + instance }}"
                } | instantiate
            }
        }
    })
}}

Will return this:

accounts:
  default:
    groups:
      test1:
        gid: 10001
      test2:
        gid: 10002
      test3:
        gid: 10003
      demo0:
        gid: 20001
      demo1:
        gid: 20002

The instances field is automatically removed from the transformed object. This filter defines two variables:

instance Contains the 0-based index number of the current instance being generated.
name Contains the name of the current instance.

The references filter

This filter gets a list of unprocessed instance names and an object containing all the instance definitions, and maps them into the explicit names of the instances as they would be once instantiated.

Example

Using the same data as the previous case, the following transformation:

accounts:
{{
    settings.accounts | transform({
        "{{ this }}": {
            "users": {
                "{{ this }}": {
                    "uid": "{{ this + instance }}",
                    "groups": "{{ this | references(parents[2].groups) }}"
                } | instantiate | merge
            }
        }
    })
}}

Will result in:

accounts:
  default:
    users:
      test1:
        uid: 1001
        password: x
        groups: ['test1', 'test2', 'test3']
      test2:
        uid: 1002
        password: x
        groups: ['test1', 'test2', 'test3']

xhernandez commented 7 months ago

@anoopcs9 @spuiuk @phlogistonjohn @obnoxxx @Shwetha-Acharya @synarete are you ok with this approach ? it's not a full definition, but I hope you get the idea of the kind of filters I would like to implement.

phlogistonjohn commented 7 months ago

It seems very general, when you last spoke about it I expected to see something more specific to the needs of the project. Given that it is very general how does it compare to something like JMESPath (see, here and here )?

phlogistonjohn commented 7 months ago

To be clear - I am not rejecting a general approach. I just think the bar is higher for a general api and that api will need to be very well documented so that others working on the project want to use it and can figure out how.

xhernandez commented 7 months ago

It seems very general, when you last spoke about it I expected to see something more specific to the needs of the project.

Yes. Given the feedback I received, I didn't want to create something that would need to edit python code instead of ansible tasks/templates when changes are made. Making it generic makes it possible to do changes without touching python code most of the times.

There are some project specific filters, though: the one to instantiate multiple copies of an object with different names, the one that assigns shared resources, like cpus and memory, proportionally to each node (not described in the initial comment), and some other minor ones. The main benefit is that these filters integrate easily and very well with the transform filter. That kind of integration is probably not possible with other methods. It's also very easy to create new filters in the future.

Given that it is very general how does it compare to something like JMESPath (see, here and here )?

I'll take a deeper look at JMESPath, but I'm not sure if it can be used to implement things like instantiation and contextual information without ending up with cryptic lines of text as we have now (assuming that it can be done just with JMESPath, otherwise there will be a mix of jinja loops and queries that will make it even less understandable).

The approach I've proposed aims to be cleaner and visually understandable, but I'll investigate more about JMESPath.

samba-in-kubernetes / sit-environment