RHOAIENG-4146 - Add support to import a model from the ODH Model Registry via Model Build Pipeline

biswassri commented 4 months ago

Adding support to use model-registry and fetch model properties from it when enabled and to continue supporting the existing pipeline without MR deployed. JIRA: RHOAIENG-4146

Description

Added a Task that fetches properties stored in model registry when the parameter model-registry-enabled is true. So far it only fetches the model source details (git repo, revision etc.) and not the containerfile details.

The added task fetch-data runs always, when model-registry-enabled is set to true then the Task connects to MR and fetches all the model related properties from the MR specified in the model-registry-hostname. The fetch-git and fetch-s3 tasks runs based on the results of fetch-model from this task. When set to false it passes on the properties from the PipelineRun file.

Below is a sample curl command used to save the model properties in the MR.

curl -X 'POST' \
  "$MR_HOSTNAME/api/model_registry/v1alpha3/model_versions" \
  -H 'accept: application/json' \
  -H 'Content-Type: application/json' \
  -d '{
  "description": "Sample for git workflow",
  "name": "2",
  "registeredModelID": "15",
  "customProperties": {
    "fetch-model": {
      "string_value": "git",
      "metadataType": "MetadataStringValue"
    },
    "git-model-repo": {
      "string_value": "https://github.com/opendatahub-io/ai-edge.git",
      "metadataType": "MetadataStringValue"
    },
    "modelRelativePath": {
      "string_value": "pipelines/models",
      "metadataType": "MetadataStringValue"
    },
    "model-dir": {
      "string_value": "tf2model",
      "metadataType": "MetadataStringValue"
    },
    "git-model-revision": {
      "string_value": "main",
      "metadataType": "MetadataStringValue"
    }
  }
}'

How Has This Been Tested?

Deploy Model Registry in cluster using the serviceRoute set to enabled state.
Once this option is set to enabled we add the MR Route under model-registry-hostname parameter to the pipeline run file
After MR operator is deployed and populated with model parameters that are require we run the pipelineRun commands: $oc apply -k tekton/aiedge-e2e/ $oc create -f tekton/aiedge-e2e/aiedge-e2e.tensorflow-housing.pipelinerun.yaml
- The remaining workflow remains the same only the pipeline run becomes shorter

You'd expect the below green pipelineRun :

Merge criteria:

[x] The commits are squashed in a cohesive manner and have meaningful messages.
[x] Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
[x] The developer has manually tested the changes and verified that the changes work

openshift-ci[bot] commented 4 months ago

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Once this PR has been reviewed and has the lgtm label, please assign grdryn for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files: - **[OWNERS](https://github.com/opendatahub-io/ai-edge/blob/main/OWNERS)** Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment

jackdelahunt commented 4 months ago

So was just looking at this and wanted to get a better understanding of what is happening.

Fetching from the model registry is now the default and the results from the model registry (info about model in s3 or git) are passed to the task that will then get the model? Is that correct?

Is it expected that we only use model registry now?

grdryn commented 3 months ago

The PR check failure appears to be as a result of the following error in the PipelineRun:

[pod/aiedge-e2e-bike-rentals-auto-ml-k8s75-build-container-image-pod/step-build-and-push] [1/2] STEP 5/6: COPY $MODEL_DIR /opt/app-root/src/model/
[pod/aiedge-e2e-bike-rentals-auto-ml-k8s75-build-container-image-pod/step-build-and-push] Error: building at STEP "COPY $MODEL_DIR /opt/app-root/src/model/": checking on sources under "/workspace/source/model_dir-bike-rentals-auto-ml/bike-rentals-auto-ml": copier: stat: "/model_dir-bike-rentals-auto-ml": no such file or directory

https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/opendatahub-io_ai-edge/246/pull-ci-opendatahub-io-ai-edge-main-test-ai-edge/1783098369141379072/artifacts/test-ai-edge/test-ai-edge/artifacts/aiedge-e2e-bike-rentals-auto-ml-k8s75/pipelineLogs.txt

LaVLaS commented 2 months ago

Adding a reply for historical reference.

Also, is this somehow aligned with the CLI work? Currently we have 2 approaches to Model registry:

CLI - where we store all pipeline params in the model registry

There was a discussion about adding a CLI to support this entire workflow but the scope of the work requires collaboration with the model registry in order to differentiate from the functionality included in the tekton cli.

This - where we get model location from the model registry during the run of the pipeline

From an offline conversation with @biswassri about https://github.com/opendatahub-io/ai-edge/pull/246#discussion_r1617421656, the intent was to use the model-registry as an abstraction layer for retrieving the model storage information. Long term, I think this approach is the better solution as we want to have tasks that can be used to fetch from git or s3 directly with the option of putting some type of registry abstraction task in front that can retrieve the storage information

LaVLaS commented 2 months ago

/hold

I'm putting this PR on hold while @biswassri adds changes based on recent comments

biswassri commented 2 months ago

@LaVLaS, @MarianMacik @grdryn I have addressed all the comments (hopefully) and pushed the changes. Please let me know if I've missed any and if we are good to merge. FYI, I've also changed the way we are reading from MR based on Landon's suggestion.

LaVLaS commented 2 months ago

/unhold

biswassri commented 1 month ago

As discussed offline with @LaVLaS, this PR will interfere and complicate the refractor that is planned as a part of RHOAISTRAT 234. Hence closing out this PR. However, I will not delete this branch from my private repo and will have the changes incase we decide to revisit this in the future.

opendatahub-io / ai-edge