Closed huang195 closed 2 years ago
The trouble with putting this in etc3 is -- how will the "readiness check" be implemented for different Iter8 domains? For KFServing, Knative,and k8s deployments -- readiness checks look different.
Even within a specific domain, different experiments might need different readiness checks. We want to keep etc3 domain agnostic, and have gone to great lengths to ensure this is the case.
K8s controllers are not intended to be complicated — simple idempotent reconcile loops is what they’re intended for. etc3 is already well past this point, and tacking on more and more onto it is a recipe for creating a unwieldy and unstable controller, at the heart of Iter8.
I also agree with the fact that users' should not be exposed to needless complexity.
I believe one way to accomplish both of the above is to create helm charts which installs experiments -- the complexity of the experiment is abstracted within the helm chart (which includes readiness checks, etc.).
What the user gets to see is only the values file, which can have a few (really few) parameters (for example, just the service name / namespace).
We can create as many Helm charts / templates to bake in as many scenarios as we wish to expose.
And we can also document how to create the Helm chart so that the user has the option of doing so themselves.
Experimentation does require templating in some form, and Helm is the top templating solution for K8s.
We can certainly explore other options like Kustomize / Jsonnet after some experience with Helming.
In environments like IBM Code engine and IBM DevOps toolchains, Iter8 is essentially going to run in a separate K8s environment -- altogether different from the one where the target application is running (at least to begin with).
Specifically, in CE, Iter8 will be packaged as containerized K8s in docker (Kind) image which can be run as a serverless application.
In toolchains, Iter8 be packaged as a Tekton task which can be imported within a pipeline, and used for various acceptance checks.
Readiness checks are required here as well... and can be accomplished through well crafted tasks intended for those purposes.
I'm more concerned about the complexities in Experiment CRD, and less about how iter8 is deployed to different environments like CE and IBM Toolchains, for which, I'm certain the use of Helm or Kustomize can improve user experience.
Looking at the experiment CR files we're using in various quickstart tutorials, they are about 60 lines long (https://github.com/iter8-tools/iter8/blob/master/samples/kfserving/quickstart/experiment.yaml). Comparing to Flagger's (https://docs.flagger.app/usage/how-it-works), which is about 30 lines long (that also includes a webhook to start a workload generator), it's much more complicated than Flagger. Even the previous version of Iter8 used a much simpler Experiment CR in the quickstart - only 20 lines (https://github.com/iter8-tools/iter8-istio/blob/master/docs/tutorials/canary.md). Without any prior knowledge about online experimentation, a user could read Iter8 v1 and Flagger's Experiment CR files and get a rought idea about what the experiment will do. However, the same user will get lost reading Iter8 v2's Experiment CR file, even the one used in the quickstart guide.
Is the intent to have users not interact with Experiment CR files directly but rather use Helm as the main user interface? My understanding is templating is good for configuration items, e.g., port number, backend DB address, etc. - like the things you put in a configuration file that's read by a service when it starts up. The content of Experiment CR is much more complicated. Is it even possible to facade
it in the same way?
So I want to i) understand if you also view this as a problem, and ii) if so, how will the use of Helm solve this problem? If the intent is to keep adding to the handler task section of Experiment CR to perform various mundane tasks (like checking if the candidate is already deployed before starting the experiment), the Experiment CR files will get much more complicated than the ones we use in the quickstart guide.
Here's an idea: Is it possible to create higher level CRDs using the existing Experiment CRD as a base?, e.g., Experiment-KFServing
and Experiment-Istio
. The higher level CRDs will have a cleaner interface, hiding a lot of details from users, e.g., handler tasks could be completely removed from these higher level CRDs. When a higher level CRD is created, it always instantiates a base Experiment CRD under the cover with all the gory details. This allows new users to not get intimidated with all the details. At the same time, more advanced users can use the "raw" Experiment CRD to do a lot of customizations directly.
@huang195 I do think Iter8's experiment CR are getting pretty complex... but IMO, the solution is not to extend the experiment CR (and consequently, etc3) functionality for every feature we want to implement... If at all possible, I would in fact like to remove some of the Iter8 etc3 / experiment CR functionality if possible to make etc3 leaner.
If we want to extend the behavior supported in an experiment, tasks are definitely the first option to consider in general; this is especially true for this task under consideration.
Tasks are expressly meant for the purpose of extending etc3 behaviors in ways like the above.
Iter8's experiment and Flaggers candidate resource are completely different abstractions.
The latter is associated with an application, the former is associated with a one off experiment (i.e., with an individual version).
I believe Iter8 is sorely missing this application-level abstraction. I completely agree with you that we need to find ways to simplify Iter8's experimentation.
I think your "higher-level CRD" idea is a really good idea. I used to think of it is as lifecycle CR earlier, and more recently as an Iter8 Candidate
CR, directly inspired by flagger.
I have some thoughts on how this higher level CRD could in fact use helm and helm values to accomplish what you are suggesting...
The idea being, you don't have to create 10 different CRs (Experiment-KFServing, Experiment-Istio, Experiment-Linkerd, Experiment-OpenHorizon, etc. all of which we want to definitely go after).
You can still create a single candidate
CR -- which watches for the resource of your choice (could be deployment for Istio, or ksvc for Knative or isvc for KFServing) -- and uses a helm chart to install the experiment.
Actually, this is an excellent thread for us to start imagining what that higher-level CR might look like... we don't have to restrict ourselves at this early design stage; we can (and should) go crazy in terms of putting out options in terms of various possible higher level CRs. Lets assess pros and cons and figure out how to move forward in terms of this idea -- which I certainly think is a high-value idea (in terms of simplification).
Yes, there are disadvantages to the higher level CRDs idea, but now I understand what you mean by "templating" at the conceptual level. The thing that still bothers me with templating is imagining once all this work is done, in our quickstart guide, we'll say do this helm install ..
command, and here's the Experiment CR file that was deployed. Then, the user will be hit with a very long and cryptic yaml file with all its gory details. That is something the higher level CRD idea is trying to avoid, as we only need to show the user the entity that gets deployed to your cluster is this nice clean higher level CRD (in the GitOps case, this is also the entity that gets stored in the Env repo). Yes, there's an ugly base Experiment CR that gets generated somewhere, but that's not something the user needs to be exposed of, especially new users.
I'd like to iterate on our various designs with you here with pros and cons, and flesh out the details once we can agree on something.
@huang195 absolutely agree with your above comment.
Here's how I'm thinking of Helm playing into your higher-level CRD idea. This combines elements of Flagger's canary resource (watch) with RedHat's Helm Operator (helm-chart + values).
Candidate resource 1:
apiVersion: iter8.tools/v2alpha2
kind: Candidate
metadata:
name: productpage-app
spec:
# target identifies the service under experimentation using its fully qualified name
target: bookinfo-iter8/productpage
watch: # watch and do something whenever the resource below is updated in-cluster
- apiVersion: v1
kind: deployment
namespace: bookinfo-iter8
name: productpage-candidate
pre-empt: true
chart:
istio/deployment # helm repo/chart that templates Iter8 experiments for deployments over Istio
values: # optional; target and resource being watched are always passed in as values.
testingPattern: canary
Candidate resource 2:
apiVersion: iter8.tools/v2alpha2
kind: Candidate
metadata:
name: flowers-app
spec:
# target identifies the model under experimentation using its fully qualified name
target: default/flowers
watch: # watch and do something whenever the resource below is updated in-cluster
- apiVersion: serving.kubeflow.org/v1beta1
kind: InferenceService
namespace: default
name: flowers-candidate
pre-empt: true
chart:
kfserving/isvc # helm repo/chart that templates Iter8 experiments for InferenceServices over KFServing
Main difference between candidate
and experiment
resource.
Candidate
is created once per application, and will automate the launch of multiple experiments during the lifetime of the application (think simple)
Experiment
is created once per experiment (and tied to the specific baseline and candidate versions under experimentation) (think gory details)
From a gitops perspective, all that the user does is update productpage-candidate
deployment in Git in case 1 and flowers-candidate
in case 2. Iter8 takes care of the rest, (assuming the candidate CRs above are deployed in cluster).
Every time Iter8 detects a change to the resource it is watching, it performs a helm install upgrade <an experiment>
which (conceptually) deletes the old experiment (if any) and creates a new one. Helm upgrades already work this way. This is the pre-emptive experimentation behavior we have discussed so far.
@sriumcp That is very Flagger-like indeed, and a major architecture change to support the new CRD. I wonder if there's any middle ground where we can get similar behavior but based on the existing v2 architecture -- maybe there isn't. Will think about it.
@huang195 Just in case it didn't come through cleanly in my comments above, this new CRD doesn't replace the experiment CRD... it builds on top (under the covers, experiment CR is still getting created).
It is indeed a big addition to the current Iter8 design, but no change to experiment CR itself. Besides, while the change is big in terms of the functionality, I think the reconcile loop implementation for this CR can be very simple: watch for resource, construct a helm values object, and do a helm install upgrade
with the given helm chart. That's it... if things go wrong during this simple process, mark it in the candidate
CR status and call it a day.
Is this enough?
In some sense, RedHat's Helm operator does even less (it doesn't watch for anything), and it is one of the most insanely popular operators out there. So perhaps... this is all that we need to start with.
The experiment reconciler has a logic that is significantly more complex (needed in order to support current experiment functionality; you can look into the reconciler implementation to get a sense of this complexity).
This proposal achieves the following. It:
The Candidate
idea is obviously inspired by Flagger. But also keep in mind the RedHat helm-operator connection... that's where I got the idea for Helm chart with values inside CR from.
Assumption involved in point 2 above is not true for code engine and many toolchain environments, so the raw experiments or just the Helm-templated experiments (without the Candidate
CR) have a much more direct role to play in such environments.
Incidentally, I changed the issue label from [Bug] to [Enhancement] :slightly_smiling_face:
We explicitly discussed readiness checks during the design of etc3 and made a conscious decision not to support it in the experiment CR (both to limit complexity, and also to focus on multi-domain experiments), so the implementation is working exactly as designed.
Obviously, it is a new feature / enhancement we want to support in Iter8 (through tasks -- ultimately hidden under the higher-level CRD / helm / whatever abstraction we choose).
@sriumcp looks good to me at high level. So it looks like we're going to have a dependency on Helm now? But it looks like user is not exposed to it, and we're just using Helm to internally generate Experiment CR from Candidate CR, it shouldn't matter to user either way. Once you have a working example, looking forward to seeing it.
This also looks good to me from a high level. I also don’t like a dependency on helm. Perhaps we should think of the HL object as identifying a template for an experiment. Such a template might be specified in different ways: in line, with another cr, with a helm chart, with a kustomize folder could be variable
In simplifying the CR, we might also consider whether the version details should be moved to another CR and just referenced. A lot of the complexity is in the versionInfo field. We could introduce a “version” or “subject” cr that records information about a single version or a set of versions. And this can be used to drive the watching/reading/modifyfing weights.
We might also create cr for the tasks. Then even if a task is shell script, it appears in an experiment as a simple thing.
I know there is a hesitation about creating a lot of CRDs and the controllers to manage them. However, there is a tradeoff with simplicity we should think about.
Re: Helm dependency, FluxV2 has a dedicated GitOps controller that is based purely on Helm. It also has a dedicated controller that is based purely on Kustomize.
RedHat has a dedicated operator for installing Helm charts.
RedHat also provides the Operator SDK for creating new operators using Helm charts.
Helm is by far the most popular templating and resource configuration tool for K8s. Which is partly the reason for the choice of Helm. I believe making a single higher-level CR do everything (Helm / Kustomize / my favorite sed / awk based templating) will defeat its primary purpose, which is simplification.
What the above design does is to template the experiment in the form of a Helm chart. It does not restrict the application deployment itself, which can be using Helm, Kustomize, kubectl yaml/json manifests, etc.
What the above design also does is watch for deployment / custom resources and eliminates the need for CI/CD to create an experiment each time there is a new version. Templating is part of the solution, and ensuring that there are a few template variables to substitute dynamically during each release, and automatically figuring out what those values are (instead of user constructed CI/CD having to do it), is a major part of the simplification.
To be explicit about the process: when the user creates an application, the user also creates a Helm chart for experimenting with that application, and installs a Candidate
CR for the application in the cluster.
The CI/CD
from then on doesn't have to focus on Iter8 all -- especially, experiment creation / set up is no more a part of the pipeline, merely the release of the candidate version, which it anyway needs to do.
I believe the goal of simple application release automation can be achieved through the above design by carefully crafted helm experiment charts. The values exposed by the chart needs to be well thought out (and necessarily opinionated) so that we don't provide 1000 options for the user in a single chart, but provide an intelligent way to set a few (example, 5 or max 10 different simple values (strings/ints)) in the context of a single chart. Each chart will correspond to some experimentation pattern that users would care about.
Again, chart creation happens once, when the application is defined. Values are derived automatically from the context of the Candidate
resource. User's CI/CD pipeline doesn't need to be aware of Iter8 for the most part.
Of course, creating a Helm chart is cheap, especially when there is a single experiment object to template. We can document this process for a few charts, so that users can craft / contribute their own charts as well.
I wonder if we can simplify things even more, e.g., using Sri's example from earlier:
apiVersion: iter8.tools/v2alpha2
kind: Candidate
metadata:
name: productpage-app
spec:
# target identifies the service under experimentation using its fully qualified name
target: bookinfo-iter8/productpage
watch: # watch and do something whenever the resource below is updated in-cluster
- apiVersion: v1
kind: deployment
namespace: bookinfo-iter8
name: productpage-candidate
pre-empt: true
chart:
istio/deployment # helm repo/chart that templates Iter8 experiments for deployments over Istio
values: # optional; target and resource being watched are always passed in as values.
testingPattern: canary
Can this simply become:
apiVersion: iter8.tools/v2alpha2
kind: Candidate
metadata:
name: productpage-app
spec:
# target identifies the service under experimentation using its fully qualified name
target: bookinfo-iter8/productpage
pre-empt: true
chart:
istio/deployment # helm repo/chart that templates Iter8 experiments for deployments over Istio
values: # optional; target and resource being watched are always passed in as values.
testingPattern: canary
As chart: istio/deployment
will implicitly assume that the target bookinfo-iter8/productpage
is a deployment, and its candidate version will follow a certain naming convention. Furthermore, instead of expose the use of Helm directly in the CR, I wonder if we can make it less Helm-dependent, e.g.,
apiVersion: iter8.tools/v2alpha2
kind: Candidate
metadata:
name: productpage-app
spec:
# target identifies the service under experimentation using its fully qualified name
target: bookinfo-iter8/productpage
pre-empt: true
kind: istio/deployment
values: # optional; target and resource being watched are always passed in as values.
testingPattern: canary
This draws idea from higher level CRDs. Instead of creating a whole bunch of new CRDs, we can use kind: istio/deployment
field to specify what type of higher level CRDs the user is interested to instantiate. How this is implemented under the cover is up to us - it can be Helm, Kustomize, sed/awk, or even a combination of these. Most importantly, this allows us to decouple from a particular technology and provide users a simplified interface.
I like the kind
idea above... only thing I would do is make it a bit explicit.
kind: helm-chart/istio/deployment
vs. kind: kustomize/kfserving/service
vs. kind: sed/knative/service
.
This naming convention is up to us to evolve of course.
Well, the point is we might want to hide implementation details from user if possible, so whatever we choose to implement behind the scenes can be whatever we choose. Unless we plan to implement both kustomize/istio/deployment
and helm-chart/istio/deployment
, do we really want to expose this detail to users?
I do believe users will want to implement their own helm
charts and plug it into this CR -- using our Helm charts as a starting point.
I really don't think kustomize
is that great a fit for this use-case. Fundamentally, kustomize
is a template-free resource configuration tool, while we're really trying do templating here.
Perhaps sed/awk/your favorite shell tools/Go-templates/Jsonnet
are other possibilities. And may be they should be bundled under custom
.
The user still does not have to really look into the Helm chart -- if the level of templating the chart offers and the way it sets up the values is sufficient for the user's purposes.
Updated comment above to introduce diversity of domains.
TBH, any of the above would be an improvement over the existing CRD. We can always keep improving it once it's implemented.
Subsumed by #948
This is to document a Slack discussion, regarding the following issue:
I'm creating an experiment CR resource without having deployed the candidate version in the cluster yet, and I was expecting that the experiment was going to be in some sort of paused state (IIRC that's version 1's behavior). However, what I observed what the experiment started to run even without the candidate version deployed. Is this the current expected behavior?
and the candidate pod is not running
Summary of the discussion so far is:
@sriumcp is suggesting that task framework is the right place to check and pause an experiment when this scenario happens (i.e., candidate not yet deployed).
My view is this is a valid suggestion as the task framework is generic enough that such things could be done there. However, I wonder if such complexity should be exposed to users in the form of a task, as I couldn't think of any reason why someone would start an experiment without all the components being ready, e.g., candidate an baseline deployed. If that's true, iter8 should check for this condition internally before starting the experiment.