janhq / jan

Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)
https://jan.ai/
GNU Affero General Public License v3.0
23.22k stars 1.35k forks source link

planning: Migrate Hub to Cortex Model Hub API #3910

Open dan-homebrew opened 1 week ago

dan-homebrew commented 1 week ago

Goal

louis-jan commented 5 days ago

Legacy

Every single provider extension registers a list of models. Then they all persisted to /models data folder. On load, Jan app scans the folder and lists onto the model hub.

Recent update: Jan app caches these models after registration for a better UX then attempts to scan in the background again to ensure.

The initial idea (Deprecated PoC https://github.com/janhq/jan/pull/3926)

All the cortex repositories are fetched during the CI app build to generate model.json files, so accessing ModelHub does not require connectivity, which uses the same mechanism as pre-populated model.json files.

There are a couple of challenges while doing that:

  1. Jan Hub does not support model family group (quantization, engine...), so all models are flattened -> We have to list out all of the available branches of repositories. This creates a lot of duplicated models in terms of UX such as the main branch and gguf branch.
  2. Model highlighting and order. There is no exact information to highlight certain models to the top. It will likely be sorted alphabetically.
  3. Poor decoration metadata, such as no description or author, and size, since the model.yaml is very minimized, or extracted from GGUF metadata, would have the same problem.

Mixing of legacy and new model hub

nguyenhoangthuan99 commented 2 days ago

Model Hub API Implementation

Overview

We are implementing a Model Hub API to enable organizations and users to add external model hubs (HuggingFace, NGC Cloud, etc.) to their local hub.

Current State

Key Features

Authentication Support

Use Cases

  1. Organization-level Integration
    • Users can add entire HuggingFace organizations (e.g., cortexso)
  2. Repository Management
    • Support for adding repositories with all associated branches

Database Schema

The new Models table will include the following fields:

Benefits

Implementation Details

API Endpoints

Authentication Management

This endpoint can also be reused for remote engine provider like openai, claude, ... The token will be saved in .cortexrc or we will create a separate .authentication file for all secret tokens for every provider?

POST /v1/auth/token
{
    "provider": "huggingface",
    "token": "your_token_here"
}

Organization Management

# Add organization
POST /v1/models/organizations
{
    "name": "cortexso",
    "provider": "huggingface"
}

# List organizations
GET /v1/models/organizations

Repository Management

# Add repository
POST /v1/models/repositories
{
    "organization": "cortexso",
    "name": "llama3.1",
    "include_all_branches": true
}

# List repositories for an organization
GET /v1/models/organizations/{org_name}/repositories

Model Management

# List models with filtering
GET /v1/models?organization=cortexso&repo=llama3.1&branch=main&status=available

# Get model by ID
GET /v1/models/{model_id}

# Pull/download model
POST /v1/models/pull

Model Grouping APIs

Hierarchical Views

Do we need API for tree view or front end will handle it?

# Get organization tree view
GET /api/v1/views/organization-tree
Response:
{
    "organizations": [
        {
            "name": "cortexso",
            "repositories": [
                {
                    "name": "llama3.1",
                    "branches": [
                        {
                            "name": "tensorrt-llm-linux-ada",
                            "models": [model_id, ... ]
                        }
                    ]
                }
            ]
        }
    ]
}

Model Type Detection

  1. Model Type Detection Logic:

    • Implement logic to identify:
      • Single .gguf models
      • Multi-part .gguf models
      • Other model types
    • Add appropriate model records to database based on type
  2. Download Process: Use existing Download service in cortex.cpp

Error Handling