Table of Contents generated with DocToc
There are two test infrastructures exist in the Kubeflow community:
If you are interested in oss-test-infra, please find useful resources here.
If you are interested in optional-test-infra, please find useful resources here
We use Prow, K8s' continuous integration tool.
We use Prow to run:
Here's high-level idea about how it works
Quick Links
This section provides guidelines for writing Argo workflows to use as E2E tests
This guide is complementary to the E2E testing guide for TFJob operator which describes how to author tests to performed as individual steps in the workflow.
Some examples to look at
Follow these steps to add a new test to a repository.
Create a Python function in that repository and return an Argo workflow if one doesn't already exist
We use Python functions defined in each repository to define the Argo workflows corresponding to E2E tests
You can look at prow_config.yaml
(see below) to see which Python functions are already defined in a repository.
Modify the prow_config.yaml
at the root of the repo to trigger your new test.
If prow_config.yaml
doesn't exist (e.g. the repository is new) copy one from an existing repository (example).
prow_config.yaml
contains an array of workflows where each workflow defines an E2E test to run; example
workflows:
- name: workflow-test
py_func: my_test_package.my_test_module.my_test_workflow
kwargs:
arg1: argument
You can use the e2e_tool.py to print out the Argo workflow and potentially submit it
Examples
Using ksonnet is deprecated. New pipelines should use python.
Create a ksonnet App in that repository and define an Argo workflow if one doesn't already exist
We use ksonnet apps defined in each repository to define the Argo workflows corresponding to E2E tests
If a ksonnet app already exists you can just define a new component in that app
Change the import for the params to use the newly defined component
params.libsonnet
to add a stanza to define params for the new componentYou can look at prow_config.yaml
(see below) to see which ksonnet apps are already defined in a repository.
Modify the prow_config.yaml
at the root of the repo to trigger your new test.
If prow_config.yaml
doesn't exist (e.g. the repository is new) copy one from an existing repository (example).
prow_config.yaml
contains an array of workflows where each workflow defines an E2E test to run; example
workflows:
- app_dir: kubeflow/testing/workflows
component: workflows
name: unittests
job_types:
- presubmit
include_dirs:
- foo/*
- bar/*
params:
params:
platform: gke
gkeApiVersion: v1beta1
app_dir: Is the path to the ksonnet directory within the repository. This should be of the form ${GITHUB_ORG}/${GITHUB_REPO_NAME}/${PATH_WITHIN_REPO_TO_KS_APP}
component: This is the name of the ksonnet component to use for the Argo workflow
name: This is the base name to use for the submitted Argo workflow.
The test infrastructure appends a suffix of 22 characters (see here)
The result is passed to your ksonnet component via the name parameter
Your ksonnet component should truncate the name if necessary to satisfy K8s naming constraints.
e.g. Argo workflow names should be less than 63 characters because they are used as pod labels
job_types: This is an array specifying for which types of prow jobs this workflow should be triggered on.
include_dirs: If specified, the pre and postsubmit jobs will only trigger this test if the PR changed at least one file matching at least one of the listed directories.
Python's fnmatch function is used to compare the listed patterns against the full path of modified files (see here)
This functionality should be used to ensure that expensive tests are only run when test impacting changes are made; particularly if its an expensive or flaky presubmit
periodic runs ignore include_dirs; a periodic run will trigger all workflows that include job_type periodic
A given ksonnet component can have multiple workflow entries to allow different triggering conditions on pre/postsubmit
For example, on presubmit we might run a test on a single platform (GKE) but on postsubmit that same test might run on GKE and minikube
this can be accomplished with different entries pointing at the same ksonnet
component but with different job_types
and params
.
params: A dictionary of parameters to set on the ksonnet component e.g. by running ks param set ${COMPONENT} ${PARAM_NAME} ${PARAM_VALUE}
pytest is really useful for writing tests
Use pytest to easily script various checks
Pytest provides fixtures for setting additional attributes in the junit files (docs)
In particular record_xml_attribute allows us to set attributes that control how's the results are grouped in test grid
name - This is the name shown in test grid
Testgrid supports grouping by spliting the tests into a hierarchy based on the name
recommendation Leverage this feature to name tests to support grouping; e.g. use the pattern
{WORKFLOW_NAME}/{PY_FUNC_NAME}
workflow_name Workflow name as set in prow_config.yaml
PY_FUNC_NAME the name of the python test function
util.py provides the helper method set_pytest_junit
to set the required attributes
run_e2e_workflow.py will pass the argument test_target_name
to your py function to create the Argo workflow
classname - testgrid uses classname as the test target and allows results to be grouped by name
recommendation - Set the classname to the workflow name as defined in prow_config.yaml
This allows easy grouping of tests by the entries defined in prow_config.yaml
Each entry in prow_config.yaml usually corresponds to a different configuration e.g. "GCP with IAP" vs. "GCP with basic auth"
So worflow name is a natural grouping
For each test run PROW defines several variables that pass useful information to your job.
The list of variables is defined in the prow docs.
These variables are often used to assign unique names to each test run to ensure isolation (e.g. by appending the BUILD_NUMBER)
The prow variables are passed via ksonnet parameter prow_env
to your workflows
You can copy the macros defined in util.libsonnet to parse the ksonnet parameter into a jsonnet map that can be used in your workflow.
Important Always define defaults for the prow variables in the dict e.g. like
local prowDict = {
BUILD_ID: "notset",
BUILD_NUMBER: "notset",
REPO_OWNER: "notset",
REPO_NAME: "notset",
JOB_NAME: "notset",
JOB_TYPE: "notset",
PULL_NUMBER: "notset",
} + util.listOfDictToMap(prowEnv);
Guard against long names by truncating the name and using the BUILD_ID to ensure the name remains unique e.g
local name = std.substr(params.name, 0, std.min(58, std.lenght(params.name))) + "-" + prowDict["BUILD_ID"];
Argo workflow names need to be less than 63 characters because they are used as pod labels
BUILD_ID are unique for each run per repo; we suggest reserving 5 characters for the BUILD_ID.
Argo workflows should have standard labels corresponding to prow variables; for example
labels: prowDict + {
workflow_template: "code_search",
},
This makes it easy to query for Argo workflows based on prow job info.
In addition the convention is to use the following labels
workflow_template: The name of the ksonnet component from which the workflow is created.
The templates for the individual steps in the argo workflow should also have standard labels
labels: prowDict + {
step_name: stepName,
workflow_template: "code_search",
workflow: workflowName,
},
Following the above conventions make it very easy to get logs for specific steps
kubectl logs -l step_name=checkout,REPO_OWNER=kubeflow,REPO_NAME=examples,BUILD_ID=0104-064201 -c main
Tests often need a K8s/Kubeflow deployment on which to create resources and run various tests.
Depending on the change being tested
The test might need exclusive access to a Kubeflow/Kubernetes cluster
The test might need a Kubeflow/K8s deployment but doesn't need exclusive access
If the test needs exclusive access to the Kubernetes cluster then there should be a step in the workflow that creates a KubeConfig file to talk to the cluster.
If the test just needs a known version of Kubeflow (e.g. master or v0.4) then it should use one of the test clusters in project kubeflow-ci for this
To connect to the cluster:
The Argo workflow should have a step that configures the KUBE_CONFIG
file to talk to the cluster
gcloud container clusters get-credentials
The Kubeconfig file should be stored in the NFS test directory so it can be used in subsequent steps
Set the environment variable KUBE_CONFIG
on your steps to use the KubeConfig file
An NFS volume is used to create a shared filesystem between steps in the workflow.
Your Argo workflows should use a PVC claim to mount the NFS filesystem into each step
nfs-external
Use the following directory structure
${MOUNT_POINT}/${WORKFLOW_NAME}
/src
/${REPO_ORG}/${REPO_NAME}
/outputs
/outputs/artifacts
The Docker image used by the Argo steps should be a ksonnet parameter stepImage
The Docker image should use an immutable image tag e.g gcr.io/kubeflow-ci/test-worker:v20181017-bfeaaf5-dirty-4adcd0
The ksonnet parameter stepImage
should be set in the prow_config.yaml
file defining the E2E tests
A common runtime is defined here and published to gcr.io/kubeflow-ci/test-worker
The first step in the Argo workflow should checkout out the source repos to the NFS directory
Use checkout.sh to checkout the repos
checkout.sh environment variable EXTRA_REPOS
allows checking out additional repositories in addition
to the repository that triggered the pre/post submit test
Most E2E tests will want to checkout kubeflow/testing in order to use various test utilities
There are lots of different ways to build Docker images (e.g. GCB, Docker in Docker). Current recommendation is
Define a Makefile to provide a convenient way to invoke Docker builds
Using Google Container Builder (GCB) to run builds in Kubeflow's CI system generally works better than alternatives (e.g. Docker in Docker, Kaniko)
Use jsonnet if needed to define GCB workflows
Makefile should expose variables for the following
Argo workflow should define the image paths and tag so that subsequent steps can use the newly built images