Migrate `readiness` task from v0.7

kalantar commented 2 years ago

Example of proposed use:

- task: k8s-objects-ready
  with:
    delay: 3 # delay in seconds before first check; optional; default 5s
    retries: 10 # number of attempts to check conditions; optional; default is 12
    retryInterval: 3 # time between attempts in seconds; optional; default is 5s
    objects: # list of k8s object references
    - kind: Pod # required
      name: curl-pod # required
      namespace: default # optional; default should determined from current context
      waitFor: condition=Ready # optional; see https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#wait

Implementation can be a port of what was used in v0.7.

For in cluster execution, user will need to have authorization to get the objects. It may be necessary to create Role/RoleBinding. Is there a way to assist users to determine what these should be?

sriumcp commented 2 years ago

k8s-objects-ready is nice!

I suggest we have only a single object (instead of a list of objects) for the task to check. I also suggest removing the objects key, and simply inlining kind, name, namespace, and waitFor at the same level as delay and retries. This would make the experiment chart templates much easier to implement.

We can have multiple readiness checks in the experiment, simply by repeating the task.

kalantar commented 2 years ago

How might we use this task with existing charts. For example, I imagine that I would want to combine one or more k8s-objects-ready tasks with the existing tasks in our current load-test-http chart to create a better load test. If I have a single task with a list of objects, I can imagine updating the load-test-http chart to optionally include it (depending on experiment-config.yaml. With multiple tasks, I do not see an easy way to combine. This increases the need to understand how to author charts.

kalantar commented 2 years ago

I suggest we have only a single object ... We can have multiple readiness checks

This is a nice simplification

sriumcp commented 2 years ago

Some usage examples I am thinking about ... (also documented in a separate k8s docs PR).

iter8 k launch -c load-test-http \
--set url=hello.svc.cluster.local \
--set ready.deploy.name=hello-there \
--set ready.svc.name=hello

The above would translate into the following experiment.yaml

- task: k8s-objects-ready
  with:
    delay: 3 # delay in seconds before first check; optional; default 5s
    retries: 10 # number of attempts to check conditions; optional; default is 12
    retryInterval: 3 # time between attempts in seconds; optional; default is 5s
    kind: Deploy # required
    name: hello-there # required
    namespace: default # optional; default should determined from current context
    conditionType: Ready # optional; see https://kubernetes.io/docs/reference/generated/kubectl/kubectl-commands#wait
- task: k8s-objects-ready
  with:
    delay: 3 # delay in seconds before first check; optional; default 5s
    retries: 10 # number of attempts to check conditions; optional; default is 12
    retryInterval: 3 # time between attempts in seconds; optional; default is 5s
    kind: Service # required
    name: hello # required
    namespace: default # optional; default should determined from current context

iter8 k launch -c load-test-http \
--set url=hello.svc.cluster.local \
--set ready.ksvc.name=hello-there-knative-svc

iter8 k launch -c load-test-http \
--set url=hello.svc.cluster.local \
--set ready.isvc.name=hello-there-kserve-svc

sriumcp commented 2 years ago

With multiple tasks, I do not see an easy way to combine. This increases the need to understand how to author charts.

For end-users, map values are easier to deal with than array values. Helm recommends that chart authors keep this in mind as they structure their charts.

In this case, ready would be a map object, and each key under it (svc, deploy, ksvc, and isvc in the examples above) would result in its own k8s-objects-ready task spec.

Sample illustration for load-test-http or grpc (obviously not exact format)

{{- if .Values.ready.deploy }}
{{- include "readiness-for-deployment" .Values.ready.deploy | toYaml }}
{{- end }}

The readiness-for-deployment will be defined in Iter8 library chart (to be reused by any experiment chart).

Readiness tasks also have implication for RBACs, which the Iter8 library chart will need to support.

kalantar commented 2 years ago

It seems like a library of default conditions to waitfor would also be necessary in charts. they depend the kind.

kalantar commented 2 years ago

Readiness tasks also have implication for RBACs, which the Iter8 library chart will need to support.

I wonder if this should be hidden in a chart or require user action. But with assistance to be aware of needed permissions.

sriumcp commented 2 years ago

I wonder if this should be hidden in a chart or require user action.

We will probably want to enable the required RBACs by default (i.e., hide them in the chart), but enable values to turn off installation of RBACs (example in the code engine environment, RBACs are not required; default permissions for the service account is enough to do everything we want to as part of the experimenting in the CE cluster).

kalantar commented 2 years ago

I'm wondering if we really need retries and retryInterval. Perhaps not even delay. Perhaps a better approach would be a single parameter,timeout, and leave the details to the implementation? Thoughts?

sriumcp commented 2 years ago

timeout sounds fantastic instead of retries and retryInterval.

In terms of implementation details, I recently stumbled across this pattern, which I am hoping to use more in the code. In particular, retry.DefaultRetry along with retry.OnError should be very handy for fetching secrets, jobs, etc.. Should also work for readiness.

kalantar commented 2 years ago

In moving this away from an implementation that uses os/exec it becomes necessary to consider authentication with Kubernetes. In some cases, we might have this implicitly (iter8 k commands). In others we do not. It appears that kubernetes specific tasks such ask8s-objects-ready need additional parameters identifying context.

sriumcp commented 2 years ago

The os/exec model also had a readiness task wrapper around it, which did not require these auth parameters. Auth was not an input to the previous readiness task (which means, whatever auth the os/exec utilized, was implicit, and not explicit). This is especially true in-cluster. I believe we can start with a similar uncomplicated approach in the library model as well.

The kubeclient which will be created in Iter8 will look for config information in the .kubeconfig file (or its equivalent in-cluster). The in-cluster configuration in particular supplies the built-in/implicit authentication for the client.

This is a natural starting point.

Of course, if we do need to reconsider auth for readiness for more advanced integrations like OCM/ACM/Multi-cluster, we can absolutely consider those requirements and introduce these parameters at that stage. This is also an excellent feature for the community to contribute (if they bring up the need for the same).

sriumcp commented 2 years ago

One big advantage in staging explicit auth for later (if at all it becomes required later) is that we reduce the complexity of unit testing the readiness task. In other words, with implicit auth, we can develop it faster, and test it more thoroughly, and approach the community a bit more quickly with iter8 k launch feature.

We can certainly revisit explicit auth if a specific use-case comes along where implicit auth renders the use-case impossible for Iter8 or shows serious limitations.

kalantar commented 2 years ago

I noticed that our original readiness implementation (v0.7) validates object names by checking length and that it satisfies a particular regular expression. This seems to check that it is a DNS RFC 1123 label. Most objects do restrict names like this. However, it appears that alternatives exist. See https://kubernetes.io/docs/concepts/overview/working-with-objects/names/#names. I have not been able to find a way to generically do a name test. Is it necessary to do this validation? If the name is wrong, it will not be found and the task will fail anyway.

sriumcp commented 2 years ago

I agree. The name check is an overkill.

kalantar commented 2 years ago

Given the readiness task, we need a way for users to specify which resources should be checked for readiness when an experiment is launched. For each resource, the chart will need to add a readiness task to the experiment.

The user should be able to simply specify multiple resources to check for readiness. Two ways this might be done are:

Identify resources by kind and name: --set ready.<kind>.name. One resource of each kind can be specified.
Identify several resource names for each kind: --set "ready.<kind>={<list of names>}"

Any values for namespace and timeout (--set.ready.timeout or --set ready.namespace would be common to all resources.

In all cases, defaults for group, version and resource can be determined from kind. These defaults might be maintained in the chart values.yaml file or could be implemented in a kind specific template in the iter8lib chart. It is easier for a user to extend the set of kinds or to modify a kind (for example, by changing the version) if they are in values.yaml so this is preferred.

sriumcp commented 2 years ago

Any values for namespace and timeout (--set.ready.timeout or --set ready.namespace would be common to all resources.

This looks nice and simple.

There is also a third option which combines the simplicity of both 1 and 2 above:

Identify a single resource for each kind: --set ready.<kind>=a name.

All of our current use-cases require only 3. Perhaps we can start there and graduate to 2, when the need arises?

kalantar commented 2 years ago

We should probably note that this approach cannot guarantee an order in which resources are checked for availability.

sriumcp commented 2 years ago

Task is done. Chart will be handled as part of a separate issue.

iter8-tools / iter8

Migrate `readiness` task from v0.7 #1190