Open karlkfi opened 3 years ago
This is a use case that has been discussed, referred to as "pre-processors" as opposed to current execution order where parent package functions run as "post-processors". This is something that can be added incrementally as it is purely additive (i.e. via a preprocessors
field in the pipeline
section) so was left out of v1 scope. We should look at this in v1.1 time frame.
Please prioritize this. The lack of this basically makes setter inheritance unworkable as currently implemented. Let's say I want to apply a setter (ex. organization-id
) but some functions in lower packages depend on that setter value. The functions in subpackages will always run before the parent apply-setters
function, meaning my only option is to manually copy the setter value down into my subpackage.
@karlkfi Not proposing a solution here, but trying to understand the problem better, does it help if we make the order of hydration configurable ? Currently, the default order if bottom up, where children are hydrated before parents. What if we make it configurable, and hydrate parents first and then children?(top down)
@phanimarupaka That would not solve the problem, as we still have cases where parents need to be applied first (ex. setter inheritance, the one I mentioned). Shifting the burden to users via configuration isn't sufficient.
So for this case, having early-mutators
, early-validators
which are applied in top-down fashion(additive change) and mutators
, validators
which apply in bottom-up fashion(current behavior) should help I guess. Advanced users can pick and choose where to include the functions.
@morgante Another way is by using target selector for functions. The starlark function can be moved to parent package and specify target functions as mentioned in this https://github.com/GoogleContainerTools/kpt/issues/2015.
@phanimarupaka No, that is not a solution. The entire point is that the parent package should not need to know the details of child packages. Once you start tightly coupling them like that, you've lost the package boundary.
We need either a form of pre and post hooks, or a way to assign weights to functions.
Morgante is right. The point is to be able to re-use packages by wrapping them in a parent package and have the parent be able to mutate the inputs and outputs of the child package, without needing the user to manually modify the child package directly.
In the common case, the setters are inputs and the resulting yaml is the output. Early mutators allow modifying the inputs. Late mutators allow modifying the outputs.
I prefer this to weights because any new parent package can be added as a wrapper without the child being able to “escape” the parent’s control. It’s encapsulation, like a function calling another function and being able to modify arguments and return values.
@phanimarupaka about: does it help if we make the order of hydration configurable ?
That won't help technically because we don't allow operating on meta resources during render
, while in this case, primary objective for top-down processing is to customize the input to the pipeline.
The point is to be able to re-use packages by wrapping them in a parent package and have the parent be able to mutate the inputs and outputs of the child package, without needing the user to manually modify the child package directly. In the common case, the setters are inputs and the resulting yaml is the output. Early mutators allow modifying the inputs. Late mutators allow modifying the outputs.
Thanks @karlkfi This articulates the use-case so well conceptually.
@bgrant0607 also touched on this in the comment https://github.com/GoogleContainerTools/kpt/issues/1280#issuecomment-877557069
I think we should design this with filter functionality together because they seem to have an interplay here.
Morgante is right. The point is to be able to re-use packages by wrapping them in a parent package and have the parent be able to mutate the inputs and outputs of the child package, without needing the user to manually modify the child package directly.
So the early mutators
should act only on the inputs, and early validators
should validate the inputs to solve this problem. The functions declared in pre-processors section (which holds early mutators and validators) should act only on the Kptfile/function config (meta resources) files and not actual resources. I feel that this is pretty powerful in terms of input values mutations and validations. This would also solve issues like https://github.com/GoogleContainerTools/kpt/issues/2041.
Some prior art in this space is PersistentPreRun hook in Cobra https://github.com/spf13/cobra/blob/v1.2.1/user_guide.md#prerun-and-postrun-hooks
In the common case, the setters are inputs and the resulting yaml is the output. Early mutators allow modifying the inputs. Late mutators allow modifying the outputs.
I prefer this to weights because any new parent package can be added as a wrapper without the child being able to “escape” the parent’s control. It’s encapsulation, like a function calling another function and being able to modify arguments and return values.
The functions declared in pre-processors section (which holds early mutators and validators) should act only on the Kptfile/function config (meta resources) files and not actual resources.
Not sure, if we need to have such restriction. Being able to modify a resource in pre-processor is also a way to configure input to the pipeline of a package. For ex, changing the replica count of kafka cluster in the resource yaml and then reconcile function changing the dns names on the basis of new replica count is a simple use-case.
The functions declared in pre-processors section (which holds early mutators and validators) should act only on the Kptfile/function config (meta resources) files and not actual resources.
Not sure, if we need to have such restriction. Being able to modify a resource in pre-processor is also a way to configure input to the pipeline of a package. For ex, changing the replica count of kafka cluster in the resource yaml and then reconcile function changing the dns names on the basis of new replica count is a simple use-case.
Agree. I think we got good inputs regarding the problem statement. We can switch to internal design doc to finalize the design.
Not sure, if we need to have such restriction. Being able to modify a resource in pre-processor is also a way to configure input to the pipeline of a package.
Agreed. Because kpt uses in-place hydration, all of the outputs may be used as inputs.
stumbled about this as well. makes in our case the usage of child-parent kptfiles difficult
current workaround: wrapper script
I want to be able to use a function in a pipeline to perform processing of values AFTER they have been set by a parent package's apply-setters.
The following cluster package works intuitively as a single package, but it doesn't work when it it's inside a parent package that also uses apply-setters, because the parent package mutators are executed after the child package's mutators...
Parent setters:
Child setters:
The workaround I know of is to copy the function to the parent package kptfile, but if I do that, it will affect all resources in the parent package, not just the resources in the child package. And even if I could scope it to the right package, it would be a hassle to have to copy function config from child packages to parent packages.
One way to solve this might be to have early and late mutators.
Order of execution:
This would kill two birds with one stone, because it would also allow for parent packages to modify the setter values of child packages. I could put apply-setters as an "early mutator" and the starlark function in a "late mutator" and it would affect the setter values as passed from parent to child without affecting the parent or sibling packages.
For example...
Parent setters:
Child setters:
Child resource:
Service account names are frequently too long when concatenating other names together. With this configuration the starlark function could truncate it after the setters have been applied. And with this pattern, the
name
setter means "node pool name" in the child package and "cluster name" in the parent package, allowing for hierarchical setter namespacing.