Declaring preprocessor functions

karlkfi commented 3 years ago

I want to be able to use a function in a pipeline to perform processing of values AFTER they have been set by a parent package's apply-setters.

The following cluster package works intuitively as a single package, but it doesn't work when it it's inside a parent package that also uses apply-setters, because the parent package mutators are executed after the child package's mutators...

Parent setters:

apiVersion: kpt.dev/v1
kind: Kptfile
metadata:
  name: cluster
pipeline:
  mutators:
    - image: gcr.io/kpt-fn/apply-setters:v0.1
      configPath: setters.yaml

Child setters:

apiVersion: kpt.dev/v1
kind: Kptfile
metadata:
  name: nodepool
pipeline:
  mutators:
    - image: gcr.io/kpt-fn/apply-setters:v0.1
      configPath: setters.yaml
    - image: gcr.io/kpt-fn/starlark:v0.1
      configPath: truncate-service-accounts.yaml

The workaround I know of is to copy the function to the parent package kptfile, but if I do that, it will affect all resources in the parent package, not just the resources in the child package. And even if I could scope it to the right package, it would be a hassle to have to copy function config from child packages to parent packages.

One way to solve this might be to have early and late mutators.

Order of execution:

parent "early mutators"
child "early mutators"
child "late mutators"
parent "late mutators"

This would kill two birds with one stone, because it would also allow for parent packages to modify the setter values of child packages. I could put apply-setters as an "early mutator" and the starlark function in a "late mutator" and it would affect the setter values as passed from parent to child without affecting the parent or sibling packages.

For example...

Parent setters:

apiVersion: v1
kind: ConfigMap
metadata:
  name: setters
  annotations:
    config.kubernetes.io/local-config: "true"
data:
  name: cluster-1
  project-id: example-1234

Child setters:

apiVersion: v1
kind: ConfigMap
metadata:
  name: setters
  annotations:
    config.kubernetes.io/local-config: "true"
data:
  name: pool-1
  cluster-name: cluster-1 # kpt-set: ${name}
  project-id: example-1234 # kpt-set: ${project-id}

Child resource:

apiVersion: iam.cnrm.cloud.google.com/v1beta1
kind: IAMServiceAccount
metadata:
  name: gke-cluster-1-pool-1 # kpt-set: gke-${cluster-name}-${name}

Service account names are frequently too long when concatenating other names together. With this configuration the starlark function could truncate it after the setters have been applied. And with this pattern, the name setter means "node pool name" in the child package and "cluster name" in the parent package, allowing for hierarchical setter namespacing.

frankfarzan commented 3 years ago

This is a use case that has been discussed, referred to as "pre-processors" as opposed to current execution order where parent package functions run as "post-processors". This is something that can be added incrementally as it is purely additive (i.e. via a preprocessors field in the pipeline section) so was left out of v1 scope. We should look at this in v1.1 time frame.

morgante commented 3 years ago

Please prioritize this. The lack of this basically makes setter inheritance unworkable as currently implemented. Let's say I want to apply a setter (ex. organization-id) but some functions in lower packages depend on that setter value. The functions in subpackages will always run before the parent apply-setters function, meaning my only option is to manually copy the setter value down into my subpackage.

phanimarupaka commented 3 years ago

@karlkfi Not proposing a solution here, but trying to understand the problem better, does it help if we make the order of hydration configurable ? Currently, the default order if bottom up, where children are hydrated before parents. What if we make it configurable, and hydrate parents first and then children?(top down)

morgante commented 3 years ago

@phanimarupaka That would not solve the problem, as we still have cases where parents need to be applied first (ex. setter inheritance, the one I mentioned). Shifting the burden to users via configuration isn't sufficient.

phanimarupaka commented 3 years ago

So for this case, having early-mutators, early-validators which are applied in top-down fashion(additive change) and mutators, validators which apply in bottom-up fashion(current behavior) should help I guess. Advanced users can pick and choose where to include the functions.

phanimarupaka commented 3 years ago

@morgante Another way is by using target selector for functions. The starlark function can be moved to parent package and specify target functions as mentioned in this https://github.com/GoogleContainerTools/kpt/issues/2015.

morgante commented 3 years ago

@phanimarupaka No, that is not a solution. The entire point is that the parent package should not need to know the details of child packages. Once you start tightly coupling them like that, you've lost the package boundary.

We need either a form of pre and post hooks, or a way to assign weights to functions.

karlkfi commented 3 years ago

Morgante is right. The point is to be able to re-use packages by wrapping them in a parent package and have the parent be able to mutate the inputs and outputs of the child package, without needing the user to manually modify the child package directly.

In the common case, the setters are inputs and the resulting yaml is the output. Early mutators allow modifying the inputs. Late mutators allow modifying the outputs.

I prefer this to weights because any new parent package can be added as a wrapper without the child being able to “escape” the parent’s control. It’s encapsulation, like a function calling another function and being able to modify arguments and return values.

droot commented 3 years ago

@phanimarupaka about: does it help if we make the order of hydration configurable ?

That won't help technically because we don't allow operating on meta resources during render, while in this case, primary objective for top-down processing is to customize the input to the pipeline.

The point is to be able to re-use packages by wrapping them in a parent package and have the parent be able to mutate the inputs and outputs of the child package, without needing the user to manually modify the child package directly. In the common case, the setters are inputs and the resulting yaml is the output. Early mutators allow modifying the inputs. Late mutators allow modifying the outputs.

Thanks @karlkfi This articulates the use-case so well conceptually.

@bgrant0607 also touched on this in the comment https://github.com/GoogleContainerTools/kpt/issues/1280#issuecomment-877557069

I think we should design this with filter functionality together because they seem to have an interplay here.

phanimarupaka commented 3 years ago

Morgante is right. The point is to be able to re-use packages by wrapping them in a parent package and have the parent be able to mutate the inputs and outputs of the child package, without needing the user to manually modify the child package directly.

So the early mutators should act only on the inputs, and early validators should validate the inputs to solve this problem. The functions declared in pre-processors section (which holds early mutators and validators) should act only on the Kptfile/function config (meta resources) files and not actual resources. I feel that this is pretty powerful in terms of input values mutations and validations. This would also solve issues like https://github.com/GoogleContainerTools/kpt/issues/2041.

Some prior art in this space is PersistentPreRun hook in Cobra https://github.com/spf13/cobra/blob/v1.2.1/user_guide.md#prerun-and-postrun-hooks

In the common case, the setters are inputs and the resulting yaml is the output. Early mutators allow modifying the inputs. Late mutators allow modifying the outputs.

I prefer this to weights because any new parent package can be added as a wrapper without the child being able to “escape” the parent’s control. It’s encapsulation, like a function calling another function and being able to modify arguments and return values.

droot commented 3 years ago

The functions declared in pre-processors section (which holds early mutators and validators) should act only on the Kptfile/function config (meta resources) files and not actual resources.

Not sure, if we need to have such restriction. Being able to modify a resource in pre-processor is also a way to configure input to the pipeline of a package. For ex, changing the replica count of kafka cluster in the resource yaml and then reconcile function changing the dns names on the basis of new replica count is a simple use-case.

phanimarupaka commented 3 years ago

The functions declared in pre-processors section (which holds early mutators and validators) should act only on the Kptfile/function config (meta resources) files and not actual resources.

Not sure, if we need to have such restriction. Being able to modify a resource in pre-processor is also a way to configure input to the pipeline of a package. For ex, changing the replica count of kafka cluster in the resource yaml and then reconcile function changing the dns names on the basis of new replica count is a simple use-case.

Agree. I think we got good inputs regarding the problem statement. We can switch to internal design doc to finalize the design.

karlkfi commented 3 years ago

Not sure, if we need to have such restriction. Being able to modify a resource in pre-processor is also a way to configure input to the pipeline of a package.

Agreed. Because kpt uses in-place hydration, all of the outputs may be used as inputs.

RafalMaleska commented 1 year ago

stumbled about this as well. makes in our case the usage of child-parent kptfiles difficult

current workaround: wrapper script

kptdev / kpt

Declaring preprocessor functions #2385