lm-sys / RouteLLM

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!
Apache License 2.0
2.78k stars 204 forks source link

Can we use 2 ollama models? #24

Closed masterchop closed 2 weeks ago

masterchop commented 1 month ago

Why every single framework force everyone to use OpenAI, please allow to use 2 ollama models, for examle llama3:8b and strong model llama3:70b, we also need support for more models what if i want an SQL model in there for sql queries or something else.

iojw commented 1 month ago

Hi there! You can indeed use 2 Ollama models following the instructions here: https://github.com/lm-sys/RouteLLM/blob/main/examples/routing_to_local_models.md, just replace the strong model with an Ollama model as well.

However, note that currently, we will still require an OpenAI key for the mf and sw_ranking routers to generate embeddings. If you would like to avoid this, you can use the berf or causal_llm classifier instead.

masterchop commented 1 month ago

Great ill give it a try when i was testing I did as you asked but i got the error requiring OpenAI model i guess it was for the embeddings ill try berf and casual_llm, thank you

iojw commented 1 month ago

Currently, it might still ask you for a key so you can fill in a random value if you're not using mf or sw_ranking.

I'll look into fixing this.

vishwas-palc commented 1 month ago

hi @iojw it still ask for openai api key for random as well

iojw commented 1 month ago

Yes, I'm working on fixing this soon. You can use a dummy value for now. Thanks for your patience!

masterchop commented 1 month ago

i manage to get it working but its trying to use llama3 from HuggingFace instead of the ollama model:

client = Controller(
  routers=["causal_llm"],
  strong_model="ollama_chat/llama3:70b",
  weak_model="ollama_chat/llama3:8b",
  api_key='caca'
)

OSError: You are trying to access a gated repo. Make sure to have access to it at https://huggingface.co/meta-llama/Meta-Llama-3-8B. 401 Client Error. (Request ID: Root=1-669a9900-0183ccae0baf7582456953fc;036e9d27-1c06-42d8-ab49-776ceb3c393e)

Cannot access gated repo for url https://huggingface.co/meta-llama/Meta-Llama-3-8B/resolve/main/config.json. Access to model meta-llama/Meta-Llama-3-8B is restricted. You must be authenticated to access it.

iojw commented 1 month ago

Hi there! Meta has gated Llama 3 behind a user agreement, so you need to make sure to accept the agreement at the link (https://huggingface.co/meta-llama/Meta-Llama-3-8B) using your HF account. Then run huggingface-cli login locally to login to your HF account, then this should work.

Let me know if you have any questions!

masterchop commented 1 month ago

Yes, I get that but what about Ollama? if you look at the issue title this was pointing to ollama, ollama offer llama3 also, if you look at your documentation https://github.com/lm-sys/RouteLLM/blob/main/examples/routing_to_local_models.md you mention ollama on the solution.

iojw commented 1 month ago

The reason is our causal LLM router is using Llama 3 under the hood, and this uses the version of Llama 3 from HF. You could try using the BERT router (bert) instead and it will no longer require this.

Harinisri29 commented 3 weeks ago

@iojw Hi, I dont want to use any paid models. I want to use only the open source models. So in that case BERT router is the only option? Bcs others are asking for OpenAI key?

masterchop commented 3 weeks ago

@iojw Hi, I dont want to use any paid models. I want to use only the open source models. So in that case BERT router is the only option? Bcs others are asking for OpenAI key?

did it work for you with BERT? could you please share sample code? i didnt work for me

Harinisri29 commented 3 weeks ago

@iojw Hi, I dont want to use any paid models. I want to use only the open source models. So in that case BERT router is the only option? Bcs others are asking for OpenAI key?

did it work for you with BERT? could you please share sample code? i didnt work for me

yeah sure.

client = Controller(

routers=["bert"], strong_model="ollama_chat/llama3.1", weak_model="ollama_chat/gemma:2b",

config = { "bert": { "checkpoint_path": "routellm/bert_gpt4_augmented" } }, api_base=None, api_key=None, progress_bar=False, ) This worked for me. Did casual_llm worked for you? and Do you know how to decide the threshold value?

masterchop commented 3 weeks ago

thanks ill give it a try

iojw commented 2 weeks ago

Closing this for now, let me know if you have other questions!