Clarity for the pipeline contract, as defined in "Writing a Promise"

osowski commented 2 years ago

In looking for detail around what the Kratix Pipeline is supposed to provide/accomplish, it took me a while to find this documented at the bottom of https://github.com/syntasso/kratix-docs/blob/main/docs/_partials/_writing-a-promise.md / https://kratix.io/docs/workshop/writing-a-promise. If you're doing some restructuring of the docs or adding new sections, that is one I would recommend elevating for newcomers.

With that information found, I did have two specific questions that seem to be gaps in the existing documentation:

What are the inputs to the intermediate (non-first) containers in the pipeline list? The pipeline contract specifies the first containers receives the user-input resource document but doesn't specify after that.
How can the containers share information between them that isn't a Kubernetes YAML document? Via /output or a different pattern? Is that supported/expected?

Based on the existing documentation, it's hard for me to understand if I would want to create containers that are indeed a pipeline (with output from one container flowing into the next container as input) OR if the pipeline containers should function more as a MapReduce/fan-out logic that all take the same input file but generate their own specific output file.

If any of this is already captured somewhere and I missed it, please point me in the right direction. I love what is there so far and the simplicity in the usability of the framework. I am digging in with the hopes that I will be able to help scale this to my team and want to make sure I understand all the points of configuration as best I can.

abangser commented 2 years ago

If you're doing some restructuring of the docs or adding new sections, that is one I would recommend elevating for newcomers.

Agreed. We are introducing a new section called Reference now and will be building it out as we go including prioritising pipelines which can largely be based on the details from this response.

Before answering your questions directly, it is worth being explicit that the pipeline feature right now is very bare bones. It is mostly implemented to a level of PoC at this stage and we expect a lot more exploration in this space around both business use cases and technical implementation. I will try and capture both current state and proposed state here. And it would be immensely helpful if you could share any surprises, hopes, or concerns so that we can capture your needs and build them into our discussions.

What are the inputs to the intermediate (non-first) containers in the pipeline list? The pipeline contract specifies the first containers receives the user-input resource document but doesn't specify after that.

The pipeline is run as a single pod with 3 containers right now run in the following order:

An init container that takes the initiating Resource Request and adds it to the input volume (code). Note: that the input volume is mounted to /output at this stage to reinforce the idea that the outcomes from a given container should result in outputs.
A second init container, which is the first container defined by the Promise (example definition, code).

This container has access to the input volume populated in the first init container, and also has an output and metadata volume mounted.

Files in the output volume at the end of the pipeline will be written to the defined GitOps repository and in turn applied to the defined worker clusters. Therefore, files in this directory are expected to be valid Kubernetes format.

Files in the metadata directory can be considered pass through and at this time Kratix uses this directory to pass cluster selectors from the Promise to any subsequent Resource Requests to decide where to deploy resources (example). Note: This is currently hard coded as a single container, but the pattern is ready to be extended to any number of containers as provided in the Promise CR.
A container that runs the Kratix Work Creator application (container code, Work Creator code). At the most basic level, this is the code responsible for transferring any pipeline output files into the defined GitOps repository.

Since right now the pipeline is hardcoded to only ever run the first docker image defined in the xaasRequestPipeline list, it inputs and outputs are fairly limited. However, the expectation is that the pattern for the xaas-request-pipeline-stage-1 init container will likely be replicated to any additional containers defined. So at this stage we expect that any image in the Request Pipeline can read from the three mounted volumes (input, output, and metadata).

How can the containers share information between them that isn't a Kubernetes YAML document? Via /output or a different pattern? Is that supported/expected?

As referenced above, the only directory that must comply to Kubernetes apply standards is /output. Right now the only use case we support is Kratix owned data sharing. Specifically, we use the /metadata directory to share cluster selectors from the Promise to the Request Pipeline. We have left the permissions on this /metadata directory intentionally open for now, though it would not be crazy to think we end up with some read only space and some user writeable space.

abangser commented 1 year ago

Just coming back around to this as the docs have since been updated to hopefully cover these ideas. I will leave open for a bit to confirm, but then look to close this.

Your suggestions for helpful bits to add were:

What are the inputs to the intermediate (non-first) containers in the pipeline list? The pipeline contract specifies the first containers receives the user-input resource document but doesn't specify after that.

How can the containers share information between them that isn't a Kubernetes YAML document? Via /output or a different pattern? Is that supported/expected?

I believe these both have been answered by Passing data between pipeline steps.

Note that the docs do not take a stance on what is expected nor answer directly the question about pipeline vs map/reduce style design. At this stage both are possible and we have experience using both. Given the current wide read / write permissions we will learn what patterns to encourage / enforce as use increases.

abangser commented 1 year ago

Closing for now as fixed based on doc updates. Feel free to comment here or open a new issue if there are still open questions.

osowski commented 1 year ago

Yep, the updated docs definitely cover what I was expecting with my questions above. Thanks for the updates!

syntasso / kratix-docs

Clarity for the pipeline contract, as defined in "Writing a Promise" #5