opst / knitfab

MLOps system with automated lineage tracking, tag-based workflow engine and task runner with container runtime.
https://knitfab.opst.co.jp
Other
2 stars 2 forks source link
lineage-tracking mlops workflow

Knitfab

Knitfab logo

MLOps system & tool. Release AI/ML engineers from trivial routines.

Directory layout

Related Repository

Getting Started

Read docs/01.getting-started.

For more detail, see docs/02.user-guide.

How to Install and Operate

Read docs/03.admin-guide.

Build Knitfab

Our build script is ./build/build.sh . This script builds Knitfab into an installer bundle.

To build, just run:

./build/build.sh

Usage

# build images and charts
./build/build.sh [--debug] [--release] [--test]

# generate debug configuration for IDE
./build/build.sh --ide vscode

Build images and charts

./build/build.sh [--debug] [--release] [--test]

OPTIONS:

    --debug       do "debug build"
    --release     do "release build"
    --test        do build for automated test

The --release and --test options are mutually exclusive. The --release and --debug options are also mutually exclusive. If neither --release nor --test are specified, it performs local build (for dev-cluster).

./build/build.sh requires commands below:

Debug build

When ./build/build.sh --debug, it generates "debug mode" installer.

Debug mode installer has additional items & features.

So, when you deploy Knitfab from "debug mode" installer, the cluster exposes following ports to be attached with dlv and compatible IDE.

You can get configurations for your IDE to attach knitd/knitd-backend with ./build/build.sh --ide <IDE_NAME>.

Currently, we supports --ide vscode only.

If your preferred IDE is not supported, write your own config and share it with us.

version tag

By default, container images are tagged as below:

${COMPONENT NAME}:${VERSION}-${GIT HASH}-${ARCH or "local"}[-debug]

If it is built as local build and your working copy has diff from HEAD, image will be suffixed with -diff-${TIMESTAMP}.

Release build

./build/bulld.sh --release

performs release build.

./build/build.sh --release prints instructions to release operations.

Release build would not go when your working copy has diffs. Before releasing, you should merge (via Pull Req.) your change into main.

Custom Release

When you would like to make your custom build and publish, pass build options to ./build/build.sh --release via environmaental variables.

[!Note]

If you want to do only local testing and not to publish, you can make a local build and install it into dev-cluster or so.

After publishing, to install your custom release, do like

CHART_VERSION=... REPOSITORY=... BRANCH=... ./installer/install.sh --prepare ...
CHART_VERSION=... REPOSITORY=... BRANCH=... ./installer/install.sh --install ...

For more detail, read ./build/build.sh and ./installer/installer.sh

The dev-cluster: A k8s cluster for developers

This repository contains provisioning scripts to deploy local Kubernetes cluster, based on virtualbox+vagrant+ansible.

You can use the cluster to try out or debug Knitfab.

Prerequisites

Additionaly, dev-cluster uses ansible, but it will be installed by Poetry.

To start dev-cluster

To start, move to the dev-cluster directry, and run:

$ ./dev-cluster/up.sh

This command does...

It takes 10+ minutes at least, and can be over 30 minutes. Please be patient. If you want to throw away them all, just do vagrant destroy -f and the VMs and k8s clusters will be destroyed.

[!NOTE]

It can be experienced that provisioning hangs with message below:

VirtualBox Guest Additions: To build modules for other installed kernels, run
VirtualBox Guest Additions:   /sbin/rcvboxadd quicksetup <version>
VirtualBox Guest Additions: or
VirtualBox Guest Additions:   /sbin/rcvboxadd quicksetup all
VirtualBox Guest Additions: Building the modules for kernel 6.5.0-15-generic.
update-initramfs: Generating /boot/initrd.img-6.5.0-15-generic
VirtualBox Guest Additions: Running kernel modules will not be replaced until
the system is restarted or 'rcvboxadd reload' triggered
VirtualBox Guest Additions: reloading kernel modules and services
VirtualBox Guest Additions: kernel modules and services 7.0.18 r162988 reloaded
VirtualBox Guest Additions: NOTE: you may still consider to re-login if some
user session specific services (Shared Clipboard, Drag and Drop, Seamless or
Guest Screen Resize) were not restarted automatically

In such case, for workaround, attach VM with ssh and run

sudo rcvboxadd reload

You can attach VM with ./dev-cluster/ssh.sh <VM Name>.

To suspend/destroy your dev-cluster

To suspend,

./dev-clsuter/suspend.sh

To destroy,

./dev-cluster/destroy.sh [-f]

In either case, you can restart your cluster with ./dev-cluster/up.sh.

How is the dev-cluster provisioned?

The dev-cluster is a k8s cluster with the following nodes (VirtualBox VM):

During provisioning, it generates records of the cluster's configurations in the .sync directory:

Install Knitfab into the dev-cluster

  1. Copy ./dev-cluster/docker-certs/meta/ca.crt to your /etc/docker/certs.d/${VM-KNIT-GATEWAY-IP}:${IMAGE-REGISTRY-PORT}
    • By default, ${VM-KNIT-GATEWAY-IP}:${IMAGE-REGITRY-PORT} is 10.10.0.3:30005.
    • You needs this operation only when on delete ./dev-cluster/docker-certs/*.
  2. Then run ./dev-cluster/install-knit.sh --prepare (once)
  3. Edit install setting directory (once)
  4. Import CA certification to your docker (once)
    • Copy ./dev-cluster/knitfab-install-settings/docker/certs.d/${VM-IP}:${PORT}/ca.crt to /etc/docker/certs.d/${VM-IP}:${PORT}/ca.crt
  5. Then run ./dev-cluster/install-knit.sh

In step 2, it generates an install setting directory as ./dev-cluster/knitfab-install-settings.

As step 3, Edit ./dev-cluster/knitfab-install-settings/values/knit-storage-nfs.yaml like below:

#
# ......
nfs:
  # ...
  external: true      # set true here
  # ...
  server: "10.10.0.3" # set IP address of your "knit-gateway" VM.
  # ...
  node: ""            # leave empty
  # ...

For more anothor config, consult docs/03.admin-guide.

Note

If you using colima (or docker-machine, minikube) as dockerd, you should put ca.crt on the (virtual) machine the dockerd process runs.

For example, in the case of colima, ca.crt should place /etc/docker/certs.d/... IN COLIMA. You may need to colima ssh and copy the file.

./dev-cluster/knitctl.sh

./dev-cluster/knitctl.sh is a wrapper for KUBECONFIG=... kubectl.

So, you can just do:

./dev-cluster/knitctl.sh ...

instead of:

$ kubectl --kubeconfig ./dev-cluster/kubeconfig/kubeconfig ...

TEST ENVIRONMENT

We provide a script, ./testctl.sh, to create the test environment.

It uses colima to create a k8s cluster for the test.

This test environment is ISOLATED FROM the dev-cluster, because it is too large to create and remove frequently. The test environment should be lightweight and capable of being created and removed frequently.

It is recommended to expand the memory of the colima instance to 8GB or more. To do this, update your ~/.colima/_template/default.

To get up a new environment, activate your colima, and run:

./testctl.sh install

To test, run

./testctl.sh test

[!Note]

Kubernetes in the colima VM and VirtualBox can have conflicts of networking.

Sometimes, you may need to suspend the dev-cluster.

Environment variables

./testctl.sh saves env-vars in the .testenv file.

When you want to run tests in IDE, make sure that test environment is up in colima and .testenv is imported.