tmforum-oda / oda-ca-docs

ODA Component Accelerator Documents
6 stars 17 forks source link

AI Model Management Canvas Operator #158

Open emmanuel-a-otchere opened 2 months ago

emmanuel-a-otchere commented 2 months ago

Description

The AI Model Manager Operator is a specialized ODA Canvas Operator tailored for managing AI models and their associated resources. Its purpose is to automate administrative tasks related to AI model lifecycle management, resource allocation, and deployment requirements.

Key Objectives

  1. Resource Specification:

    • The Operator helps realize specification for AI hardware requirements - CPU and GPU resource, Memory etc. for AI models. This is to ensure optimal utilization of computational resources within the Canvas.
    • It will help define resource quotas, limits, and requests, enabling model scaling and allocation.
  2. Model Deployment Configuration:

    • The Operator facilitates the deployment of AI models by allowing based on specified essentials. E.g.
      • Model Name: Identifies the AI model.
      • Model Version: Specifies the version of the model.
      • Model Catalog: Specify which catalog in the Model Manager
      • Deployment Options: Components can customize deployment behavior, such as batch size, parallelism, and inference concurrency.
      • Inference Endpoint Configuration: Enable define endpoints for ODA Component AI model inference.
  3. Inference Options Management:

    • ODA Component can configure inference behavior using the Operator:
      • Split Inferences: Distributes inference requests across available resources, clusters or "canvases", optimizing throughput.
      • Resource Affinity: Helper to v-test specific AI models are deployed on designated environments.
      • Quality of Service (QoS): Balances performance and resource utilization based on Component-defined priorities.
  4. Cross-Canvas Resource Management:

    • The Operator supports seamless integration with other components across multi-canvas ecosystem.
    • Users can allocate AI resources across different "canvases" (logical domains or environments), enabling flexibility and resource sharing.

Prerequisite

(Draft)