[Feature Request]: Official Phi-3 family support in AutoGen - Microsoft-all-the-things

microsoft / autogen

A programming framework for agentic AI 🤖

https://microsoft.github.io/autogen/

Creative Commons Attribution 4.0 International

31.63k stars 4.6k forks source link

[Feature Request]: Official Phi-3 family support in AutoGen - Microsoft-all-the-things #2768

Open ChristianWeyer opened 4 months ago

ChristianWeyer commented 4 months ago

Is your feature request related to a problem? Please describe.

Now that we have the Phi-3 SLM flagship family (including vision) from Microsoft, it would make more than sense to officially and fully integrate it into AutoGen. This would be a strong statement in the market.

Describe the solution you'd like

Full integration of Phi-3 Mini, Small, Medium and especially Vision.

Additional context

No response

LittleLittleCloud commented 4 months ago

Can you describe the solution you like. Especially how will phi-3 support different from supporting other open source llm like llama

ChristianWeyer commented 4 months ago

The support for Mini, Small, and Medium may just be like supporting other non-OAI models. Support for Vision should be explicit, like we currently have e.g. for Llava.

LittleLittleCloud commented 4 months ago

LLaVA should just work in ollama client and so should the phi-3 vision model? @BeibinLi might know better here...

BeibinLi commented 4 months ago

Yes, phi models are already supported through Ollama and LM Studio.

The Phi-3-vision was just release a few days ago but should also be supported soon.

ChristianWeyer commented 4 months ago

LLaVA should just work in ollama client and so should the phi-3 vision model? @BeibinLi might know better here...

Do we have an Ollama client in AutioGen or are you referring to the OAI compatibility of Ollama's API?

Ollama does not (yet) provide OAI Tool Calling. For this, we need to use something like LiteLLM, BTW.

ChristianWeyer commented 4 months ago

LLaVA should just work in ollama client and so should the phi-3 vision model? @BeibinLi might know better here...

Do we have an Ollama client in AutioGen or are you referring to the OAI compatibility of Ollama's API?

Ollama does not (yet) provide OAI Tool Calling. For this, we need to use something like LiteLLM, BTW.

The quality of the quants available via Ollama for Llava are just not really good, I have to say.

Looking forward to seeing Phi-3-Vision in a very high quality quant doing the things we see in the online demos.

HansUXdev commented 4 months ago

I was thinking the same thing. It would be nice if autogen studio had an update where it defaults to a local llm like phi.

BeibinLi commented 4 months ago

@ChristianWeyer @HansUXdev Can you provide more information? Do you want to run Phi-3 locally or remotely in Azure?

There are two different approach:

Running locally with HuggingFace models. See here for tutorial.
Use remote API through Ollama or LM Studio, as we have pointed out earlier.

However, we are aware Azure endpoint is slightly different. See Azure AI Studio, and AutoGen hasn't supported it fully yet.

Let me know how you want to access Phi-3, and I can brainstorm with the Phi team together to offer a better solution.

HansUXdev commented 4 months ago

Either Locally with lm studio running, or if the phi team wants to be generous, they could just do just host for free like groq.

Personally, I'm thinking from ux / dev experience of using autogen studio. So for example right now we have to configure the model, etc. But if you check out the vs code extension "Continue", they watch your lm studio settings, so when you switch from phi to whatever, it just works. It would be really cool to see that here.

mpalaourg commented 4 months ago

However, we are aware Azure endpoint is slightly different. See Azure AI Studio, and AutoGen hasn't supported it fully yet.

Maybe off-topic, but relevant to the above quote. Do you support models from Azure endpoint (serverless option) natively? Or do we have to implement something to use those models?

I am trying to use models hosted there (not phi-3 specific) but let's say it's not a smooth sale as I was expected.

BeibinLi commented 4 months ago

@mpalaourg Can you provide more details regarding Azure serverless option? I did not see it.

The closet information I can find is "Phi-3 in models as a service Phi-3 models are available with pay-as-you-go billing via inference APIs. We will provide additional pricing details at a later date." in here

mpalaourg commented 4 months ago

@mpalaourg Can you provide more details regarding Azure serverless option? I did not see it.

The closet information I can find is "Phi-3 in models as a service Phi-3 models are available with pay-as-you-go billing via inference APIs. We will provide additional pricing details at a later date." in here

Yeah my bad. I knew that I was off-topic here, but I saw you spoke about the Azure endpoint.

I am not trying to use phi-3 at this point. To be specific I have deployed a llama-2, a cohere and a Mistral model in the serverless option. The only difference between them is the api_key and base_url, and because of this I thought that it would be plug and play in the llm_config. I could open a new ticket with my findings and tag you there, I don't want to derail this conversation.

BeibinLi commented 4 months ago

@mpalaourg I am not familiar with the Azure serverless option. However, if you can interact with the endpoint using an API key and base_url in the same way as you do with OpenAI, then you can directly include them in the llm_config for AutoGen. It works similarly to LM Studio, Ollama, and the Azure ML endpoint.

MaxAkbar commented 3 months ago

I know it's been a while for this thread, but I am hoping that the case can be made to integrate Phi-3 models via onnxruntime-genai.

LMStudio uses a UI and I'm not sure that's a good scenario for server-based option. I get it has a CLI and an API endpoint. Ollam is a better option than LMStudio but I am sure this depends on the requirement.

That said, both have dependencies that may not be an option as it can make the deployment bloated. Another scenario is deploying to mobile or other devices and require the Language Model to be local.

I hope this is clear on the need and I hope that it becomes supported.