dan-homebrew commented 2 months ago

Goal

Built-in model library has curated model.yaml with best parameters
Aim for a best-in-class user experience that
Include system prompts that provide good experience? (may be anti-pattern)

User Story

We have a Model Converter that can take in a Huggingface Model repo
Compile GGUF to a Cortex Model Repo (i.e. tag-based)
Future: ONNX, TensorRT-LLM (using TRTLLM-Cloud)
Should clearly show any errors
Should auto-populate README

Decisions

https://github.com/janhq/cortex.cpp/discussions/1178

Tasklist

Model Compilation Pipeline

[ ] Update Model compilation Infra
[ ] Is there a way for us to "queue" up models?

Future Roadmap

Model Recommendations: can we consider recommending bigger models (e.g. q8) if hardware is strong?

nguyenhoangthuan99 commented 2 months ago

Objectives

Implement model quantization CI
Update model.yaml for three models
Organize branch structure as per discussion janhq/cortex.cpp#1154

Quantization Strategy

Each quantization will be tagged in Hugging Face repo, e.g., 8b-gguf-q4-km
This approach will:
- Facilitate easier management of models from cortex.cpp
- Simplify model downloading and execution commands

Example Command

This is an example command to run model with tag

cortex pull llama3.1:8b-gguf-q4-km
cortex run llama3.1:8b-gguf-q4-km

This concise command provides sufficient information for users.

Tasks

[x] Develop CI runner for building all quantization for each model:
- Download from original source
- Convert to GGUF format
- Perform quantization
- Update Hugging Face repository
[x] Create script to update model.yaml for models:
- Update default parameters
- Update system prompts

This approach will streamline model management and improve user experience when working with cortex.cpp.

nguyenhoangthuan99 commented 2 months ago

CI Pipelines for Model Conversion and Quantization

This PR introduces two CI pipelines to streamline the model processing workflow:

1. CI Convert and Quantization Pipeline

This pipeline automates the process of converting and quantizing models.

Inputs:

Source Hugging Face model repository (e.g., meta-llama/Meta-Llama-3.1-8B-Instruct)
Source model size (e.g., 8b)
Target model ID: The repo_id in cortexso/janhq where the processed model will be pushed (e.g., llama3.1)
Quantization level: Either a specific level (e.g., 'q4-km') or 'all' for all supported levels Supported levels: q2-k, q3-ks, q3-km, q3-kl, q4-ks, q4-km, q5-ks, q5-km, q6-k, q8-0

Process:

Download the source model repository if not already present
Convert the source model to GGUF format
Quantize the GGUF model to the specified level(s)
Upload the quantized model to the target repository under the appropriate branch

Result:

After successful processing, new tags will be added to the model repository. For example, see the llama3 repository:

Image showing model tags

2. CI Update model.yml Pipeline

This pipeline updates the model.yml file with new information.

Inputs:

Key-value pairs to update, separated by spaces (e.g "max_tokens=4096 top_p=0.9 top_k=0.5")
Source model size (e.g., 8b)
Target model ID: The repo_id in cortexso/janhq where the updated model.yml will be pushed (e.g., llama3.1)
Quantization level: Either a specific level (e.g., 'q4-km') or 'all' for all supported levels Supported levels: q2-k, q3-ks, q3-km, q3-kl, q4-ks, q4-km, q5-ks, q5-km, q6-k, q8-0

Process:

Set up the necessary environment
Execute a script to update the model.yml file with the new information
Upload the updated model.yml file to Hugging Face

These pipelines automate crucial steps in model processing and metadata management, streamlining the workflow for model updates and deployments.

0xSage commented 2 months ago

@nguyenhoangthuan99 how do we use this pipeline? i.e. how are we adding new models

nguyenhoangthuan99 commented 2 months ago

The cortexso model repo must be created before running this pipeline (e.g. llama3 must be created before running below example, the hf login token in CI doesn't have permission to create repo)

Supported quantization levels: q2-k, q3-ks, q3-km, q3-kl, q4-ks, q4-km, q5-ks, q5-km, q6-k, q8-0

To use this pipeline:

Go to https://github.com/janhq/cortex.llamacpp/actions
Select the Convert model to gguf with specified quant workflow inside the action tab
Click on Run workflow And input all parameters Note that the Target HuggingFace model ID to push is cortexso model repo, in my example it is llama3
After click run, go to tab Action and we can see the workflow is running
When the CI is finished, we can go to the cortex so repo https://huggingface.co/cortexso/llama3, to check if the model is updated

dan-homebrew commented 1 month ago

@nguyenhoangthuan99 I am refactoring the "Built-in Model Library" to a separate epic: https://github.com/janhq/models/issues/21

We will need to do a lot of housekeeping
Let's focus this epic on the Model Converter Pipeline.

hiento09 commented 1 month ago

Infra:

[x] GitHub Actions grants runner group permission for the repo janhq/models
[x] GitHub Actions grants secret permission for the repo janhq/models

nguyenhoangthuan99 commented 1 month ago

I add the updated model converter pipeline to janhq/models repo. And also add a pipeline to automatically update the model.yml file in hugging face cc @gabrielle-ong, now we can run CI pipeline in this repo.

Guild for update model.yml file

Click to Update model.yml with specific quant
Click run workflow

Please update with the format "top_p=0.9" "top_k=40" "stop=['<|end_of_text|>', '<|eot_id|>']"

Note that the prompt_template field should not update this way because this field sometimes cannot handle proper special character on string.

gabrielle-ong commented 1 month ago

Marking as complete, successfully done for mistral-nemo and llama3.2 To run model converter pipeline from janhq/models repo

janhq / models

epic: Model Converter Pipeline #22

Goal

User Story

Decisions

Tasklist

Model Compilation Pipeline

Future Roadmap

Objectives

Quantization Strategy

Example Command

Tasks

CI Pipelines for Model Conversion and Quantization

1. CI Convert and Quantization Pipeline

Inputs:

Process:

Result:

2. CI Update model.yml Pipeline

Inputs:

Process: