janhq / models

Models support in Jan and Cortex
MIT License
5 stars 2 forks source link

epic: Model Converter Pipeline #22

Closed dan-homebrew closed 1 month ago

dan-homebrew commented 2 months ago

Goal

User Story

Decisions

Tasklist

Model Compilation Pipeline

Future Roadmap

nguyenhoangthuan99 commented 2 months ago

Objectives

  1. Implement model quantization CI
  2. Update model.yaml for three models
  3. Organize branch structure as per discussion janhq/cortex.cpp#1154

Quantization Strategy

Example Command

This is an example command to run model with tag

cortex pull llama3.1:8b-gguf-q4-km
cortex run llama3.1:8b-gguf-q4-km

This concise command provides sufficient information for users.

Tasks

  1. [x] Develop CI runner for building all quantization for each model:

    • Download from original source
    • Convert to GGUF format
    • Perform quantization
    • Update Hugging Face repository
  2. [x] Create script to update model.yaml for models:

    • Update default parameters
    • Update system prompts

This approach will streamline model management and improve user experience when working with cortex.cpp.

nguyenhoangthuan99 commented 2 months ago

CI Pipelines for Model Conversion and Quantization

This PR introduces two CI pipelines to streamline the model processing workflow:

1. CI Convert and Quantization Pipeline

This pipeline automates the process of converting and quantizing models.

Inputs:

Process:

  1. Download the source model repository if not already present
  2. Convert the source model to GGUF format
  3. Quantize the GGUF model to the specified level(s)
  4. Upload the quantized model to the target repository under the appropriate branch

Result:

After successful processing, new tags will be added to the model repository. For example, see the llama3 repository:

Image showing model tags

2. CI Update model.yml Pipeline

This pipeline updates the model.yml file with new information.

Inputs:

Process:

  1. Set up the necessary environment
  2. Execute a script to update the model.yml file with the new information
  3. Upload the updated model.yml file to Hugging Face

These pipelines automate crucial steps in model processing and metadata management, streamlining the workflow for model updates and deployments.

0xSage commented 2 months ago

@nguyenhoangthuan99 how do we use this pipeline? i.e. how are we adding new models

nguyenhoangthuan99 commented 2 months ago

The cortexso model repo must be created before running this pipeline (e.g. llama3 must be created before running below example, the hf login token in CI doesn't have permission to create repo)

Supported quantization levels: q2-k, q3-ks, q3-km, q3-kl, q4-ks, q4-km, q5-ks, q5-km, q6-k, q8-0

To use this pipeline:

dan-homebrew commented 1 month ago

@nguyenhoangthuan99 I am refactoring the "Built-in Model Library" to a separate epic: https://github.com/janhq/models/issues/21

hiento09 commented 1 month ago

Infra:

nguyenhoangthuan99 commented 1 month ago

I add the updated model converter pipeline to janhq/models repo. And also add a pipeline to automatically update the model.yml file in hugging face cc @gabrielle-ong, now we can run CI pipeline in this repo.

Guild for update model.yml file

  1. Click to Update model.yml with specific quant Image
  2. Click run workflow Image

Please update with the format "top_p=0.9" "top_k=40" "stop=['<|end_of_text|>', '<|eot_id|>']"

Note that the prompt_template field should not update this way because this field sometimes cannot handle proper special character on string.

gabrielle-ong commented 1 month ago

Marking as complete, successfully done for mistral-nemo and llama3.2 To run model converter pipeline from janhq/models repo