Add the ability to use models deployed on Azure AI Studio

rohanthacker commented 1 month ago

What feature would you like to be added?

I would like the ability to use a model that is deployed on Azure AI Studio and uses the Azure AI Model Inference API.

If needed, I would like to assist in the creation of the feature. However I have a few questions and require some help about what would be the best way to implement this feature.

Questions:

Since the Azure AI Model Inference API is compatible with Azure OpenAI model deployments can we extend the already created AzureOpenAIChatCompletionClient?

I have already tried to do this however the API produces an invalid URL and responds with a 404 error, as the endpoint created by Azure AI Studio and the client are not the same.

Or would it be preferred to extend the ChatCompletionClient, I started with this initially but I noticed a fair bit of overlap with the OpenAI Clients.

Or perhaps there is a simpler way we can integrate this API that I am not aware of?

Looking forward to discussing more on this

Why is this needed?

Azure AI Studio provides a large catalog of models along with various deployment options that make it easy for developers to access a wide variety of models. Given the nature of this project, having the ability to integrate this diverse set of models out of the box will allow for more adoption of the project and allow developers to bring their own model in without the need to code a new client for each.

jackgerrits commented 4 weeks ago

Thanks for the issue! I think supporting the Azure AI Model Inference API would be great! I think it makes sense to have it as a separate model client that also implements the ChatCompletionClient protocol makes sense.

We'd love if you're interested in helping build this!

rohanthacker commented 4 weeks ago

@jackgerrits I'll be happy to implement these changes, can this task be assigned to me as I have already started work on this task in my fork of this repository. I'll raise a draft pull request in a day or so for us to discuss.

rysweet commented 4 weeks ago

@rohanthacker - this is supported in dotnet now with https://github.com/microsoft/autogen/pull/3790

edirgarcia commented 2 weeks ago

Following this issue, since this will enable us to use phi, on some internal cases.

ekzhu commented 2 weeks ago

@rohanthacker any update on this one?

ekzhu commented 2 weeks ago

@edirgarcia Depending on the API you are using. If you are using the Core API, you don't need to wait for this feature you can use the azure.ai.inference.models directly in your agent implementation. If you are using the AgentChat API, you may need to wait for the wrapper but you can also implement your own agent.

rohanthacker commented 1 week ago

Hi @ekzhu,

I’ve completed the initial implementation of ChatCompletionClient using azure.ai.inference.

This is a draft, as I wanted to discuss the significant code duplication between this client and the existing OpenAIChatCompletionClient. If the team agrees with this approach, I’ll code/copy the rest of the implementation as needed.

Currently, the two clients are nearly identical, with the only differences being the type variations required by each library. Given that azure.ai.inference is compatible with OpenAI, At the moment I don't see the need to have a separate concrete class for this considering the azure.ai.inference library is OpenAI compatible. What are your thoughts?

Additionally, I was able to get OpenAIChatCompletionClient to work with models deployed on Azure AI Studio by setting the base_url and api_key. However, I encountered a few minor compatibility issues with specific models:

Cohere-command-r: Returns an error with an invalid response_format when json_output is set to True.
Mistral-Nemo: Responds with “extra inputs are not permitted” due to an error in the message formatting.

Both models work fine when connected directly to the OpenAI API.

Happy to keep working on this—just looking for team input on code duplication and next steps.

rohanthacker commented 1 week ago

@edirgarcia Phi-3.5 is working with the OpenAIChatCompletionClient. Please refer this example I came across today by azureml-examples

edirgarcia commented 1 week ago

@edirgarcia Phi-3.5 is working with the OpenAIChatCompletionClient. Please refer this example I came across today by azureml-examples

Thank you I will test this out next week.

ekzhu commented 1 week ago

Currently, the two clients are nearly identical, with the only differences being the type variations required by each library. Given that azure.ai.inference is compatible with OpenAI, At the moment I don't see the need to have a separate concrete class for this considering the azure.ai.inference library is OpenAI compatible. What are your thoughts?

Thanks @rohanthacker!

For the Core API, the user can choose any client they want to use. So this is not a blocker.

I think there is still benefit of wrapping the azure.ai.inference client behind our ChatCompletionClient protocol. For AgentChat users it is useful as the built-in agents accepts a ChatCompletionClient. We can resolve the code duplication in the future.

microsoft / autogen

Add the ability to use models deployed on Azure AI Studio #3902

What feature would you like to be added?

Why is this needed?