Closed dan-homebrew closed 4 weeks ago
I agree with Option 2
. Current approach can't hold default model for other engines. Also, it caused duplication trouble between main
branch and a specific branch (3b, for example).
I think we can go with option 2 before releasing because it's more future-proof.
Ill update the 35 cortexso repos with metadata.yml
on main branch.
@nguyenhoangthuan99 Can I get help on the recommended branches these 35 models? (listed from https://huggingface.co/cortexso)
cortexso/llama3.2 (3b-gguf-q4-km) cortexso/mistral-nemo (12b-gguf-q4-ks) cortexso/llama3 (8b-gguf-q4-ks) cortexso/llama3.1 (8b-gguf-q4-ks) cortexso/nomic-embed-text-v1 (main) - rarely used - 3 downloads last month
cortexso/tinyllama (1b-gguf) (future 1b-gguf-q4-ks) cortexso/mistral (7b-gguf) (future 7b-gguf-q4-ks) cortexso/phi3 (mini-gguf) (future mini-gguf-q4-ks) cortexso/gemma2 (2b-gguf) (future 2b-gguf-q4-ks) cortexso/openhermes-2.5 (7b-gguf) (future 7b-gguf-q4-ks) cortexso/mixtral (7x8b-gguf) (future 7x8b-gguf-q4-ks) cortexso/yi-1.5 (34b-gguf) (future 34b-gguf-q4-ks) cortexso/aya (12.9b-gguf) (future 12.9b-gguf-q4-ks) cortexso/codestral (22b-gguf) (fututre 22b-gguf-q4-ks) cortexso/command-r (35b-gguf) (future 35b-gguf-q4-ks) cortexso/gemma (7b-gguf) (future 7b-gguf-q4-ks) cortexso/qwen2 (7b-gguf) (future 7b-gguf-q4-ks)
cortexso/NVIDIA-NIM (remote model, we don't have gguf file for this model) cortexso/gpt-4o-mini (remote model, we don't have gguf file for this model) cortexso/open-router-auto (remote model, we don't have gguf file for this model) cortexso/groq-mixtral-8x7b-32768 (remote model, we don't have gguf file for this model) cortexso/groq-gemma-7b-it (remote model, we don't have gguf file for this model) cortexso/groq-llama3-8b-8192 (remote model, we don't have gguf file for this model) cortexso/groq-llama3-70b-8192 (remote model, we don't have gguf file for this model) cortexso/claude-3-5-sonnet-20240620 (remote model, we don't have gguf file for this model) cortexso/gpt-3.5-turbo (remote model, we don't have gguf file for this model) cortexso/gpt-4o (remote model, we don't have gguf file for this model) cortexso/martian-model-router (remote model, we don't have gguf file for this model) cortexso/cohere-command-r-plus (remote model, we don't have gguf file for this model) cortexso/cohere-command-r (remote model, we don't have gguf file for this model) cortexso/mistral-large-latest (remote model, we don't have gguf file for this model) cortexso/mistral-small-latest (remote model, we don't have gguf file for this model) cortexso/claude-3-haiku-20240307 (remote model, we don't have gguf file for this model) cortexso/claude-3-sonnet-20240229 (remote model, we don't have gguf file for this model) cortexso/claude-3-opus-20240229 (remote model, we don't have gguf file for this model)
QN: is it metadata.yaml
or metadata.yml
?
We are currently using model.yml
so I think it should be .yml
to be consistent
@gabrielle-ong, I agree with metadata.yml
If possible, please start with cortexso/llama3.2
. Thank you!
Thanks @namchuai and @nguyenhoangthuan99! added to llama3.2 and working down the list
Created all the metadata.yml files in the list Categorized the list above - Noted Alex on future changes to the default branch for some models - let me know when the CI is run to add the new branches for those models
Thanks @James and @nguyenhoangthuan99!
Goal
cortex pull <model>
andcortex run <model>
High-level Structure
cortex run <model>:<branch>
pulls a specific versioncortex run <model>
pulls a default versionDecisions
Decision 1: Default Model Download
We need to figure out which model version
cortex pull <model>
andcortex run
model` will pull.Option 1:
main
branchmain
branch to hold our recommended version3b-gguf
intomain
branchHowever, I do not think this is correct long term:
main
branch is non-descriptive as a branch namemain
could hold3b-gguf
, but user is unaware3b-gguf
main
branch requires more work to manage longer-termmain
branch will take some time - i.e. merge, git issues etcIt is also incorrect to compare our approach to Ollama. Ollama uses a tag-based system similar to Docker, where
latest
is a pointer to3b
. It is difficult to replicate this in a "straightforward UX" in Git (i.e. tags are not very visible from main page)Option 2:
main
branch hasmetadata.yaml
metadata.yaml
approach to defining Default Model downloadsmetadata.yaml
is also used to generate the CLI UX forcortex pull
orcortex run
How it works
main
branch will hold a few files (see below)metadata.yaml
will be very simpleIn the future,
metadata.yaml
can be more complicated, and allow for fine-grained control of CLI UX, e.g. sections for 3b, 7b, or by engine.Furthermore, we can use
metadata.yaml
as a data structure to hold information about the different Model versions.