Closed DanielChico closed 4 months ago
Hi there! For this, is there a need to have an endpoint, or would it be better if exposed as a Python API you can use internally? Could you also share more about what you're using RouteLLM for? Would love to better understand the use case so we can figure out how to make this library better :)
Hello, You are right it would be better to expose it as a Python API. I suggested the endpoint because I thought it would be the least disruptive way, considering how it is currently used with the config file, etc. Can you suggest a way to do it as a Python API so there is no need to deploy the server?
I will explain my use case. I have an API Gateway that takes requests in the OpenAI format and redirects them to the providers, even those providers that don't have an OpenAI-like API. Instead of exposing the models directly, we prefer to expose some shadowed names that represent models with different qualities. Let's name them 1, 2, and 3, with 1 being the best model and 3 the worst. This is done so we can continuously study which models best fit each quality characteristic, and we can change them without the user noticing.
The service also has usage limits and shows the requests and token usage of each user. What I want to do is add a fourth model named "auto," for example, that decides according to the prompt whether it's better to use model 1 or 2. I thought I could do this using your API, but we face some problems. First, the models are hardcoded in the API, so I cannot call the models as I want. The API key is loaded when the server runs, so I cannot pass the API key with each request to bind the request to the user who made it.
After some thought, I realized that the only feature I need for my use case is to know which model is the best for a prompt. I don't need the full API feature, so I came here to suggest making this feature.
Here's a Python interface we're considering:
from routellm import RouterManager
mf_router = RouterManager(router="mf", config="config.example.yaml", strong_model="gpt-4o", weak_model="llama3")
model_name = mf_router.route(prompt="Hello!", threshold=0.116)
print(f"Routing to {model_name}") # Either "gpt-4o" or "llama3"
What do you think? Also, would this work better than a HTTP endpoint for your use case?
I think this will be perfect for our use case. If you let me, I would be willing to assist with its implementation. Let me know if you would like my help or if you prefer doing it by yourself.
Thank you for the feedback!
There's a WIP PR opened at https://github.com/lm-sys/RouteLLM/pull/13 that should satisfy your use-case. I've made some changes to match our existing OpenAI-compatible interface. Will look to merge it soon!
First of all, I want to congratulate you on this project. I think it is excellent. I would like to integrate this functionality with my existing workflow. I have a gateway that integrates multiple providers, and I need to know which model would be the best to call, instead of making a request through this API.
Therefore, I believe it would be beneficial to have an endpoint that simply indicates which model is the best. I can develop this feture by myself; I just want to know if you also think this would be a good feature so I can make a PR to contribute.