kaust-generative-ai / local-deployment-llama-cpp

Project to help you get started working with LLMs locally with LLaMA C++.
Apache License 2.0
1 stars 0 forks source link

LLaMA C++ integration with HF #9

Open davidrpugh opened 1 week ago

davidrpugh commented 1 week ago

There is a nice example in the HF docs showing how you can use both llama-cli and llama-server to work with GGUF files directly from HF.

https://huggingface.co/docs/hub/en/gguf-llamacpp

These are good examples to include in our tutorials.

davidrpugh commented 1 day ago

Related examples can be added to the llama-cli tutorial notebooks. Quickstart notebook should show how to download a model from HF.

-   `-mu MODEL_URL --model-url MODEL_URL`: Specify a remote http url to download the file.

Here is an example of usage.

MODEL_URL=https://huggingface.co/ggml-org/gemma-1.1-7b-it-Q4_K_M-GGUF/resolve/main/gemma-1.1-7b-it.Q4_K_M.gguf
llama-cli --model-url "$MODEL_URL" --prompt "Once upon a time"

This requires a working with a build that has enabled curl support. See issue #11 .

davidrpugh commented 1 day ago

Other relevant HF related options for llama-cli.

-hfr,  --hf-repo REPO                   Hugging Face model repository (default: unused)
                                        (env: LLAMA_ARG_HF_REPO)
-hff,  --hf-file FILE                   Hugging Face model file (default: unused)
                                        (env: LLAMA_ARG_HF_FILE)
-hft,  --hf-token TOKEN                 Hugging Face access token (default: value from HF_TOKEN environment
                                        variable)
                                        (env: HF_TOKEN)