Closed masterchop closed 3 months ago
Hi there! You can indeed use 2 Ollama models following the instructions here: https://github.com/lm-sys/RouteLLM/blob/main/examples/routing_to_local_models.md, just replace the strong model with an Ollama model as well.
However, note that currently, we will still require an OpenAI key for the mf
and sw_ranking
routers to generate embeddings. If you would like to avoid this, you can use the berf
or causal_llm
classifier instead.
Great ill give it a try when i was testing I did as you asked but i got the error requiring OpenAI model i guess it was for the embeddings ill try berf and casual_llm, thank you
Currently, it might still ask you for a key so you can fill in a random value if you're not using mf
or sw_ranking
.
I'll look into fixing this.
hi @iojw it still ask for openai api key for random as well
Yes, I'm working on fixing this soon. You can use a dummy value for now. Thanks for your patience!
i manage to get it working but its trying to use llama3 from HuggingFace instead of the ollama model:
client = Controller(
routers=["causal_llm"],
strong_model="ollama_chat/llama3:70b",
weak_model="ollama_chat/llama3:8b",
api_key='caca'
)
OSError: You are trying to access a gated repo. Make sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3-8B. 401 Client Error. (Request ID: Root=1-669a9900-0183ccae0baf7582456953fc;036e9d27-1c06-42d8-ab49-776ceb3c393e)
Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B/resolve/main/config.json. Access to model meta-llama/Meta-Llama-3-8B is restricted. You must be authenticated to access it.
Hi there! Meta has gated Llama 3 behind a user agreement, so you need to make sure to accept the agreement at the link (https://huggingface.co/meta-llama/Meta-Llama-3-8B) using your HF account. Then run huggingface-cli login
locally to login to your HF account, then this should work.
Let me know if you have any questions!
Yes, I get that but what about Ollama? if you look at the issue title this was pointing to ollama, ollama offer llama3 also, if you look at your documentation https://github.com/lm-sys/RouteLLM/blob/main/examples/routing_to_local_models.md you mention ollama on the solution.
The reason is our causal LLM router is using Llama 3 under the hood, and this uses the version of Llama 3 from HF. You could try using the BERT router (bert
) instead and it will no longer require this.
@iojw Hi, I dont want to use any paid models. I want to use only the open source models. So in that case BERT router is the only option? Bcs others are asking for OpenAI key?
@iojw Hi, I dont want to use any paid models. I want to use only the open source models. So in that case BERT router is the only option? Bcs others are asking for OpenAI key?
did it work for you with BERT? could you please share sample code? i didnt work for me
@iojw Hi, I dont want to use any paid models. I want to use only the open source models. So in that case BERT router is the only option? Bcs others are asking for OpenAI key?
did it work for you with BERT? could you please share sample code? i didnt work for me
yeah sure.
client = Controller(
routers=["bert"], strong_model="ollama_chat/llama3.1", weak_model="ollama_chat/gemma:2b",
config = { "bert": { "checkpoint_path": "routellm/bert_gpt4_augmented" } }, api_base=None, api_key=None, progress_bar=False, ) This worked for me. Did casual_llm worked for you? and Do you know how to decide the threshold value?
thanks ill give it a try
Closing this for now, let me know if you have other questions!
Why every single framework force everyone to use OpenAI, please allow to use 2 ollama models, for examle llama3:8b and strong model llama3:70b, we also need support for more models what if i want an SQL model in there for sql queries or something else.