lm-sys / RouteLLM

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!
Apache License 2.0
3.3k stars 250 forks source link

Feature: Add an endpoint that only returns which model is the best #8

Closed DanielChico closed 4 months ago

DanielChico commented 4 months ago

First of all, I want to congratulate you on this project. I think it is excellent. I would like to integrate this functionality with my existing workflow. I have a gateway that integrates multiple providers, and I need to know which model would be the best to call, instead of making a request through this API.

Therefore, I believe it would be beneficial to have an endpoint that simply indicates which model is the best. I can develop this feture by myself; I just want to know if you also think this would be a good feature so I can make a PR to contribute.

iojw commented 4 months ago

Hi there! For this, is there a need to have an endpoint, or would it be better if exposed as a Python API you can use internally? Could you also share more about what you're using RouteLLM for? Would love to better understand the use case so we can figure out how to make this library better :)

DanielChico commented 4 months ago

Hello, You are right it would be better to expose it as a Python API. I suggested the endpoint because I thought it would be the least disruptive way, considering how it is currently used with the config file, etc. Can you suggest a way to do it as a Python API so there is no need to deploy the server?

I will explain my use case. I have an API Gateway that takes requests in the OpenAI format and redirects them to the providers, even those providers that don't have an OpenAI-like API. Instead of exposing the models directly, we prefer to expose some shadowed names that represent models with different qualities. Let's name them 1, 2, and 3, with 1 being the best model and 3 the worst. This is done so we can continuously study which models best fit each quality characteristic, and we can change them without the user noticing.

The service also has usage limits and shows the requests and token usage of each user. What I want to do is add a fourth model named "auto," for example, that decides according to the prompt whether it's better to use model 1 or 2. I thought I could do this using your API, but we face some problems. First, the models are hardcoded in the API, so I cannot call the models as I want. The API key is loaded when the server runs, so I cannot pass the API key with each request to bind the request to the user who made it.

After some thought, I realized that the only feature I need for my use case is to know which model is the best for a prompt. I don't need the full API feature, so I came here to suggest making this feature.

iojw commented 4 months ago

Here's a Python interface we're considering:

from routellm import RouterManager

mf_router = RouterManager(router="mf", config="config.example.yaml", strong_model="gpt-4o", weak_model="llama3")
model_name = mf_router.route(prompt="Hello!", threshold=0.116)
print(f"Routing to {model_name}") # Either "gpt-4o" or "llama3"

What do you think? Also, would this work better than a HTTP endpoint for your use case?

DanielChico commented 4 months ago

I think this will be perfect for our use case. If you let me, I would be willing to assist with its implementation. Let me know if you would like my help or if you prefer doing it by yourself.

iojw commented 4 months ago

Thank you for the feedback!

There's a WIP PR opened at https://github.com/lm-sys/RouteLLM/pull/13 that should satisfy your use-case. I've made some changes to match our existing OpenAI-compatible interface. Will look to merge it soon!

iojw commented 4 months ago

This is now merged! Please check out the new docs for more details.

Let me know if you have any questions or suggestions!