opendatahub-io / opendatahub-operator

Open Data Hub operator to manage ODH component integrations
https://opendatahub.io
Apache License 2.0
60 stars 140 forks source link

This operator is the primary operator for Open Data Hub. It is responsible for enabling Data science applications like Jupyter Notebooks, Modelmesh serving, Datascience pipelines etc. The operator makes use of DataScienceCluster CRD to deploy and configure these applications.

Table of contents

Usage

Prerequisites

If single model serving configuration is used or if Kserve component is used then please make sure to install the following operators before proceeding to create a DSCI and DSC instances.

Additionally installing Authorino operator & Service Mesh operator enhances user-experience by providing a single sign on experience.

Installation

Developer Guide

Pre-requisites

Download manifests

The get_all_manifests.sh script facilitates the process of fetching manifests from remote git repositories. It is configured to work with a predefined map of components and their corresponding manifest locations.

Structure of COMPONENT_MANIFESTS

Each component is associated with its manifest location in the COMPONENT_MANIFESTS map. The key is the component's name, and the value is its location, formatted as <repo-org>:<repo-name>:<branch-name>:<source-folder>:<target-folder>

Workflow

  1. The script clones the remote repository <repo-org>/<repo-name> from the specified <branch-name>.
  2. It then copies the content from the relative path <source-folder> to the local opt/manifests/<target-folder> folder.

Local Storage

The script utilizes a local, empty folder named opt/manifests to host all required manifests, sourced directly from each component’s source repository.

Adding New Components

To include a new component in the list of manifest repositories, simply extend the COMPONENT_MANIFESTS map with a new entry, as shown below:

declare -A COMPONENT_MANIFESTS=(
  // existing components ...
  ["new-component"]="<repo-org>:<repo-name>:<branch-name>:<source-folder>:<target-folder>"
)

Customizing Manifests Source

You have the flexibility to change the source of the manifests. Invoke the get_all_manifests.sh script with specific flags, as illustrated below:

./get_all_manifests.sh --odh-dashboard="maistra:odh-dashboard:test-manifests:manifests:odh-dashboard"

If the flag name matches components key defined in COMPONENT_MANIFESTS it will overwrite its location, otherwise the command will fail.

for local development
make get-manifests

This first cleanup your local opt/manifests folder. Ensure back up before run this command if you have local changes of manifests want to reuse later.

for build operator image
make image-build

By default, building an image without any local changes(as a clean build) This is what the production build system is doing.

In order to build an image with local opt/manifests folder, to set IMAGE_BUILD_FLAGS ="--build-arg USE_LOCAL=true" in make. e.g make image-build -e IMAGE_BUILD_FLAGS="--build-arg USE_LOCAL=true"

Build Image

Deployment

Deploying operator locally

Deploying operator using OLM

There are 2 ways to test your changes with modification:

  1. Each component in the DataScienceCluster CR has devFlags.manifests field, which can be used to pull down the manifests from the remote git repos of the respective components. By using this method, it overwrites manifests and creates customized resources for the respective components.

  2. [Under implementation] build operator image with local manifests.

Update API docs

Whenever a new api is added or a new field is added to the CRD, please make sure to run the command:

  make api-docs 

This will ensure that the doc for the apis are updated accordingly.

Enabled logging

Global logger configuration can be changed with an environemnt variable ZAP_LOG_LEVEL or a command line switch --log-mode <mode> for example from CSV. Command line switch has higher priority. Valid values for <mode>: "" (as default) || prod || production || devel || development.

Verbosity level is INFO. To fine tune zap backend standard operator sdk zap switches can be used.

Log level can be changed by DSCI devFlags during runtime by setting .spec.devFlags.logLevel. It accepts the same values as --zap-log-level command line switch. See example :

apiVersion: dscinitialization.opendatahub.io/v1
kind: DSCInitialization
metadata:
  name: default-dsci
spec:
  devFlags:
    logLevel: debug
  ...
logmode stacktrace level verbosity Output Comments
devel WARN INFO Console lowest level, using epoch time
development WARN INFO Console same as devel
"" ERROR INFO JSON default option
prod ERROR INFO JSON highest level, using human readable timestamp
production ERROR INFO JSON same as prod

Example DSCInitialization

Below is the default DSCI CR config

kind: DSCInitialization
apiVersion: dscinitialization.opendatahub.io/v1
metadata:
  name: default-dsci
spec:
  applicationsNamespace: opendatahub
  monitoring:
    managementState: Managed
    namespace: opendatahub
  serviceMesh:
    controlPlane:
      metricsCollection: Istio
      name: data-science-smcp
      namespace: istio-system
    managementState: Managed
  trustedCABundle:
    customCABundle: ''
    managementState: Managed

Apply this example with modification for your usage.

Example DataScienceCluster

When the operator is installed successfully in the cluster, a user can create a DataScienceCluster CR to enable ODH components. At a given time, ODH supports only one instance of the CR, which can be updated to get custom list of components.

  1. Enable all components
apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
  name: default-dsc
spec:
  components:
    codeflare:
      managementState: Managed
    dashboard:
      managementState: Managed
    datasciencepipelines:
      managementState: Managed
    kserve:
      managementState: Managed
      serving:
        ingressGateway:
          certificate:
            type: OpenshiftDefaultIngress
        managementState: Managed
        name: knative-serving
    kueue:
      managementState: Managed
    modelmeshserving:
      managementState: Managed
    modelregistry:
      managementState: Managed
      registriesNamespace: "odh-model-registries"
    ray:
      managementState: Managed
    trainingoperator:
      managementState: Managed
    trustyai:
      managementState: Managed
    workbenches:
      managementState: Managed
  1. Enable only Dashboard and Workbenches
apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
  name: example
spec:
  components:
    dashboard:
      managementState: Managed
    workbenches:
      managementState: Managed

Note: Default value for managementState in component is false.

Run functional Tests

The functional tests are writted based on ginkgo and gomega. In order to run the tests, the user needs to setup the envtest which provides a mocked kubernetes cluster. A detailed explanation on how to configure envtest is provided here.

To run the test on individual controllers, change directory into the contorller's folder and run

ginkgo -v

This provides detailed logs of the test spec.

Note: When runninng tests for each controller, make sure to add the BinaryAssetsDirectory attribute in the envtest.Environment in the suite_test.go file. The value should point to the path where the envtest binaries are installed.

In order to run tests for all the controllers, we can use the make command

make unit-test

Note: The make command should be executed on the root project level.

Run e2e Tests

A user can run the e2e tests in the same namespace as the operator. To deploy opendatahub-operator refer to this section. The following environment variables must be set when running locally:

export KUBECONFIG=/path/to/kubeconfig

Ensure when testing RHODS operator in dev mode, no ODH CSV exists Once the above variables are set, run the following:

make e2e-test

Additional flags that can be passed to e2e-tests by setting up E2E_TEST_FLAGS variable. Following table lists all the available flags to run the tests:

Flag Description Default value
--skip-deletion To skip running of dsc-deletion test that includes deleting DataScienceCluster resources. Assign this variable to true to skip DataScienceCluster deletion. false

Example command to run full test suite skipping the test for DataScienceCluster deletion.

make e2e-test -e OPERATOR_NAMESPACE=<namespace> -e E2E_TEST_FLAGS="--skip-deletion=true"

API Overview

Please refer to api documentation

Component Integration

Please refer to components docs

Troubleshooting

Please refer to troubleshooting documentation

Upgrade testing

Please refer to upgrade testing documentation