epic: Cortex Model Repo supports Default Model download

dan-homebrew commented 4 weeks ago

Goal

Cortex's Built-in Libraries have a "Cortex Model repo" format
Cortex Model Repos are a critical data structure to support cortex pull <model> and cortex run <model>

High-level Structure

Cortex model repos are Git repos
Git repo has branches, which hold different "versions" of the model (i.e. quantization, engine etc)
- cortex run <model>:<branch> pulls a specific version
- cortex run <model> pulls a default version
Git repo's "main branch" holds README and metadata, to allow user to navigate other branches

Decisions

Decision 1: Default Model Download

We need to figure out which model version cortex pull <model> and cortex runmodel` will pull.

Option 1: `main` branch

We have so far gone with main branch to hold our recommended version
We basically merge recommended version, e.g. 3b-gguf into main branch

However, I do not think this is correct long term:

Longer-term, I think default model selection will be algorithmic
- e.g. Hardware detection -> pull most "optimized" model
main branch is non-descriptive as a branch name
- e.g. main could hold 3b-gguf, but user is unaware
- It is better to be upfront that we default to 3b-gguf
I think main branch requires more work to manage longer-term
- Process of switching main branch will take some time - i.e. merge, git issues etc

It is also incorrect to compare our approach to Ollama. Ollama uses a tag-based system similar to Docker, where latest is a pointer to 3b. It is difficult to replicate this in a "straightforward UX" in Git (i.e. tags are not very visible from main page)

Option 2: `main` branch has `metadata.yaml`

I would like to instead propose a metadata.yaml approach to defining Default Model downloads
This can be superseded in the future with an algorithmic approach to selecting Default Model
metadata.yaml is also used to generate the CLI UX for cortex pull or cortex run

How it works

Cortex Model Repo's main branch will hold a few files (see below)
The v1 of metadata.yaml will be very simple
```
# File system
metadata.yaml
README.md
```

# metadata.yaml
version: 1
name: mistral
default: 3b-gguf

In the future, metadata.yaml can be more complicated, and allow for fine-grained control of CLI UX, e.g. sections for 3b, 7b, or by engine.

Furthermore, we can use metadata.yaml as a data structure to hold information about the different Model versions.

Can expand to include MMLU scores
Can expand to include file sizes
This can all be automated in the future through CI/CD on the Git repo

namchuai commented 4 weeks ago

I agree with Option 2. Current approach can't hold default model for other engines. Also, it caused duplication trouble between main branch and a specific branch (3b, for example).

I think we can go with option 2 before releasing because it's more future-proof.

gabrielle-ong commented 4 weeks ago

Ill update the 35 cortexso repos with metadata.yml on main branch.

@nguyenhoangthuan99 Can I get help on the recommended branches these 35 models? (listed from https://huggingface.co/cortexso)

Local models

cortexso/llama3.2 (3b-gguf-q4-km) cortexso/mistral-nemo (12b-gguf-q4-ks) cortexso/llama3 (8b-gguf-q4-ks) cortexso/llama3.1 (8b-gguf-q4-ks) cortexso/nomic-embed-text-v1 (main) - rarely used - 3 downloads last month

Future updates to default (when CI run to update branches)

cortexso/tinyllama (1b-gguf) (future 1b-gguf-q4-ks) cortexso/mistral (7b-gguf) (future 7b-gguf-q4-ks) cortexso/phi3 (mini-gguf) (future mini-gguf-q4-ks) cortexso/gemma2 (2b-gguf) (future 2b-gguf-q4-ks) cortexso/openhermes-2.5 (7b-gguf) (future 7b-gguf-q4-ks) cortexso/mixtral (7x8b-gguf) (future 7x8b-gguf-q4-ks) cortexso/yi-1.5 (34b-gguf) (future 34b-gguf-q4-ks) cortexso/aya (12.9b-gguf) (future 12.9b-gguf-q4-ks) cortexso/codestral (22b-gguf) (fututre 22b-gguf-q4-ks) cortexso/command-r (35b-gguf) (future 35b-gguf-q4-ks) cortexso/gemma (7b-gguf) (future 7b-gguf-q4-ks) cortexso/qwen2 (7b-gguf) (future 7b-gguf-q4-ks)

Remote models (no need for metadata.yml as no gguf file)

cortexso/NVIDIA-NIM (remote model, we don't have gguf file for this model) cortexso/gpt-4o-mini (remote model, we don't have gguf file for this model) cortexso/open-router-auto (remote model, we don't have gguf file for this model) cortexso/groq-mixtral-8x7b-32768 (remote model, we don't have gguf file for this model) cortexso/groq-gemma-7b-it (remote model, we don't have gguf file for this model) cortexso/groq-llama3-8b-8192 (remote model, we don't have gguf file for this model) cortexso/groq-llama3-70b-8192 (remote model, we don't have gguf file for this model) cortexso/claude-3-5-sonnet-20240620 (remote model, we don't have gguf file for this model) cortexso/gpt-3.5-turbo (remote model, we don't have gguf file for this model) cortexso/gpt-4o (remote model, we don't have gguf file for this model) cortexso/martian-model-router (remote model, we don't have gguf file for this model) cortexso/cohere-command-r-plus (remote model, we don't have gguf file for this model) cortexso/cohere-command-r (remote model, we don't have gguf file for this model) cortexso/mistral-large-latest (remote model, we don't have gguf file for this model) cortexso/mistral-small-latest (remote model, we don't have gguf file for this model) cortexso/claude-3-haiku-20240307 (remote model, we don't have gguf file for this model) cortexso/claude-3-sonnet-20240229 (remote model, we don't have gguf file for this model) cortexso/claude-3-opus-20240229 (remote model, we don't have gguf file for this model)

gabrielle-ong commented 4 weeks ago

QN: is it metadata.yaml or metadata.yml? We are currently using model.yml so I think it should be .yml to be consistent

namchuai commented 4 weeks ago

@gabrielle-ong, I agree with metadata.yml

If possible, please start with cortexso/llama3.2. Thank you!

gabrielle-ong commented 4 weeks ago

Thanks @namchuai and @nguyenhoangthuan99! added to llama3.2 and working down the list

gabrielle-ong commented 4 weeks ago

Created all the metadata.yml files in the list Categorized the list above - Noted Alex on future changes to the default branch for some models - let me know when the CI is run to add the new branches for those models

gabrielle-ong commented 3 weeks ago

Thanks @James and @nguyenhoangthuan99!

janhq / cortex.cpp