lm-sys / RouteLLM

A framework for serving and evaluating LLM routers - save LLM costs without compromising quality!
Apache License 2.0
2.87k stars 217 forks source link

Can I use Azure open AI #29

Open royrajjyoti12 opened 1 month ago

royrajjyoti12 commented 1 month ago

I am trying to use azure openai but I got this error.

raise OpenAIError( openai.OpenAIError: The api_key client option must be set either by passing api_key to the client or by setting the OPENAI_API_KEY environment variable

Also can we use multiple models instead of only two strong and weak model?

iojw commented 1 month ago

Can you share the full code that you're using?

We still currently depend on OpenAI for embeddings if you're using mf or sw_ranking so that may be why. You can use bert instead and set a dummy value for the OpenAI key as a workaround, fix incoming.

We don't currently support >2 models! More research is required here.

xXBlackMasterXx commented 1 month ago

That totally explains why it's asking for OpenAI API Key even though I've set my correct api keys for both, strong and weak models.

Is there a way to use an embedding model from Azure OpenAI instead? I have the same issue trying to route an Azure OpenAI model (gpt-4o) and a Groq model (llama3-8b-8192).

I can share the code I'm using for this:

# Weak model secrets
os.environ["GROQ_API_KEY"] = "<groq-api-key>"

# Strong model secrets
os.environ["AZURE_API_KEY"] = "<azure-openai-api-key>"
os.environ["AZURE_API_BASE"] = "<azure-openai-endpoint>"
os.environ["AZURE_API_VERSION"] = "<azure-openai-api-version>"

# Import the controller
from routellm.controller import Controller

# Create the controller
# I've used the prefix azure/ according to the LiteLLM docs
# https://litellm.vercel.app/docs/providers/azure
client = Controller(
    routers = ["mf"],
    strong_model = "azure/gpt-4o",
    weak_model = "groq/llama3-8b-8192"
)

# Make a request
response = client.chat.completions.create(
    model = "router-mf-0.11593",
    messages = [
        {"role":"user", "content":"Hello!"}
    ]
)

# AI Message
message = response.choices[0].message.content
# Model used
model_used = response.model

print(f"Model used: {model_used}")
print(f"Response: {message}")

It throws this error:

{
    "name": "AuthenticationError",
    "message": "Error code: 401 - {'error': {'message': 'Incorrect API key provided: ********************. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}",
    "stack": "---------------------------------------------------------------------------
AuthenticationError                       Traceback (most recent call last)
Cell In[9], line 1
----> 1 response = client.chat.completions.create(
      2     model = \"router-mf-0.11593\",
      3     messages = [
      4         {\"role\":\"user\", \"content\":\"Hola\"}
      5     ]
      6 )
      8 message = response.choices[0].message.content
      9 used_model = response.model

File ~/.local/lib/python3.10/site-packages/routellm/controller.py:150, in Controller.completion(self, router, threshold, **kwargs)
    147     router, threshold = self._parse_model_name(kwargs[\"model\"])
    149 self._validate_router_threshold(router, threshold)
--> 150 kwargs[\"model\"] = self._get_routed_model_for_completion(
    151     kwargs[\"messages\"], router, threshold
    152 )
    153 return completion(api_base=self.api_base, api_key=self.api_key, **kwargs)

File ~/.local/lib/python3.10/site-packages/routellm/controller.py:111, in Controller._get_routed_model_for_completion(self, messages, router, threshold)
    105 def _get_routed_model_for_completion(
    106     self, messages: list, router: str, threshold: float
    107 ):
    108     # Look at the last turn for routing.
    109     # Our current routers were only trained on first turn data, so more research is required here.
    110     prompt = messages[-1][\"content\"]
--> 111     routed_model = self.routers[router].route(prompt, threshold, self.model_pair)
    113     self.model_counts[router][routed_model] += 1
    115     return routed_model

File ~/.local/lib/python3.10/site-packages/routellm/routers/routers.py:42, in Router.route(self, prompt, threshold, routed_pair)
     41 def route(self, prompt, threshold, routed_pair):
---> 42     if self.calculate_strong_win_rate(prompt) >= threshold:
     43         return routed_pair.strong
     44     else:

File ~/.local/lib/python3.10/site-packages/routellm/routers/routers.py:239, in MatrixFactorizationRouter.calculate_strong_win_rate(self, prompt)
    238 def calculate_strong_win_rate(self, prompt):
--> 239     winrate = self.model.pred_win_rate(
    240         self.strong_model_id, self.weak_model_id, prompt
    241     )
    242     return winrate

File ~/.local/lib/python3.10/site-packages/torch/utils/_contextlib.py:116, in context_decorator.<locals>.decorate_context(*args, **kwargs)
    113 @functools.wraps(func)
    114 def decorate_context(*args, **kwargs):
    115     with ctx_factory():
--> 116         return func(*args, **kwargs)

File ~/.local/lib/python3.10/site-packages/routellm/routers/matrix_factorization/model.py:124, in MFModel.pred_win_rate(self, model_a, model_b, prompt)
    122 @torch.no_grad()
    123 def pred_win_rate(self, model_a, model_b, prompt):
--> 124     logits = self.forward([model_a, model_b], prompt)
    125     winrate = torch.sigmoid(logits[0] - logits[1]).item()
    126     return winrate

File ~/.local/lib/python3.10/site-packages/routellm/routers/matrix_factorization/model.py:113, in MFModel.forward(self, model_id, prompt)
    109 model_embed = self.P(model_id)
    110 model_embed = torch.nn.functional.normalize(model_embed, p=2, dim=1)
    112 prompt_embed = (
--> 113     OPENAI_CLIENT.embeddings.create(input=[prompt], model=self.embedding_model)
    114     .data[0]
    115     .embedding
    116 )
    117 prompt_embed = torch.tensor(prompt_embed, device=self.get_device())
    118 prompt_embed = self.text_proj(prompt_embed)

File ~/.local/lib/python3.10/site-packages/openai/resources/embeddings.py:114, in Embeddings.create(self, input, model, dimensions, encoding_format, user, extra_headers, extra_query, extra_body, timeout)
    108         embedding.embedding = np.frombuffer(  # type: ignore[no-untyped-call]
    109             base64.b64decode(data), dtype=\"float32\"
    110         ).tolist()
    112     return obj
--> 114 return self._post(
    115     \"/embeddings\",
    116     body=maybe_transform(params, embedding_create_params.EmbeddingCreateParams),
    117     options=make_request_options(
    118         extra_headers=extra_headers,
    119         extra_query=extra_query,
    120         extra_body=extra_body,
    121         timeout=timeout,
    122         post_parser=parser,
    123     ),
    124     cast_to=CreateEmbeddingResponse,
    125 )

File ~/.local/lib/python3.10/site-packages/openai/_base_client.py:1259, in SyncAPIClient.post(self, path, cast_to, body, options, files, stream, stream_cls)
   1245 def post(
   1246     self,
   1247     path: str,
   (...)
   1254     stream_cls: type[_StreamT] | None = None,
   1255 ) -> ResponseT | _StreamT:
   1256     opts = FinalRequestOptions.construct(
   1257         method=\"post\", url=path, json_data=body, files=to_httpx_files(files), **options
   1258     )
-> 1259     return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))

File ~/.local/lib/python3.10/site-packages/openai/_base_client.py:936, in SyncAPIClient.request(self, cast_to, options, remaining_retries, stream, stream_cls)
    927 def request(
    928     self,
    929     cast_to: Type[ResponseT],
   (...)
    934     stream_cls: type[_StreamT] | None = None,
    935 ) -> ResponseT | _StreamT:
--> 936     return self._request(
    937         cast_to=cast_to,
    938         options=options,
    939         stream=stream,
    940         stream_cls=stream_cls,
    941         remaining_retries=remaining_retries,
    942     )

File ~/.local/lib/python3.10/site-packages/openai/_base_client.py:1040, in SyncAPIClient._request(self, cast_to, options, remaining_retries, stream, stream_cls)
   1037         err.response.read()
   1039     log.debug(\"Re-raising status error\")
-> 1040     raise self._make_status_error_from_response(err.response) from None
   1042 return self._process_response(
   1043     cast_to=cast_to,
   1044     options=options,
   (...)
   1048     retries_taken=options.get_max_retries(self.max_retries) - retries,
   1049 )

AuthenticationError: Error code: 401 - {'error': {'message': 'Incorrect API key provided: ********************. You can find your API key at https://platform.openai.com/account/api-keys.', 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_api_key'}}"
}
praveennvr commented 1 month ago

I have same error while using Azure OpenAI URL and Key. For weak model, I am using an internal API and key. Is there any work around to use custom base URLs and keys for Strong and Weak models?

xXBlackMasterXx commented 1 month ago

I was digging into the source code and found out that the file inside routellm/routers/routers.py was using a default client from OpenAI.

For my use case, I've only change it to AzureOpenAI

image

and modified the embedding model deployment name to text-embedding-ada-002 (this name depends on the deployment name.

image

I need to some testings in order to see if this works (the router, not the embedding model, that works good) but I think would be good to be able to choose an embedding model of our preference into the Controller.

iojw commented 1 month ago

Yes, this makes perfect sense. We are looking into other embedding models at the moment and will release an update soon!