[Roadmap]: Roadmap about Enhanced non-OpenAI models

qingyun-wu commented 3 weeks ago

One major milestone in release v0.2.30-v0.2.32 will be enhanced support of non-OpenAI models.

Plan for the next release v0.2.32:

### Tasks about enhanced non-OpenAI Model Support
- [ ] A blogpost highlighting the enhanced non-OpenAI model support: https://github.com/microsoft/autogen/pull/2965
- [ ] [Cohere client](https://github.com/microsoft/autogen/pull/3004)
- [x] [Groq client](https://github.com/microsoft/autogen/pull/3003)
- [ ] [01 Yi model](https://github.com/microsoft/autogen/pull/3048)

@marklysze, @Hk669, @yiranwu0, feel free to add tasks to the task list.

💡Feel free to suggest any other pressing features you want to see,/which issues you hope to be addressed, or/which PRs to be merged in the next release!

And let's make it happen together 🏆💪!

Finished task in release v0.2.30 and v0.2.31

### Tasks about enhanced non-OpenAI Model Support
- [x] [Client Utils](https://github.com/microsoft/autogen/pull/2949)
- [x] [Mistral Client](https://github.com/microsoft/autogen/pull/2892)
- [x] [Together AI Client](https://github.com/microsoft/autogen/pull/2919)
- [x] [Anthropic Client](https://github.com/microsoft/autogen/pull/2931)
- [x] Notebook: #2916

Hk669 commented 3 weeks ago

thanks @qingyun-wu.

Josephrp commented 3 weeks ago

may i suggest 01 Yi model familly apis , i just got the docs, and this was actually on my list : to bring it to autogen i mean :-) openai drop in compatible already.

Josephrp commented 3 weeks ago

also the cohere api has high potential since they have a performant function calling model among other interesting offerings , just a thought

qingyun-wu commented 3 weeks ago

may i suggest 01 Yi model familly apis , i just got the docs, and this was actually on my list : to bring it to autogen i mean :-) openai drop in compatible already.

Good ideas! Let's see if we can find volunteers to add those!

PanQiWei commented 2 weeks ago

Hi, I find this roadmap is mainly about clients implementation. Here is an optimization advice I would very appreciate if you can also implement in 0.2.30 🙏

Suggestion

Implement an object pool to cache clients to avoid instantiate client with the same key repeatly.

Reason

This is really influence agent's init speed when there are tones of agents and tools. Especially for tools/functions, everytime a tool is registered to a llm, a new OpenAIWrapper is created to update config, that's ok if only request config (payload) is updated, however, in current implementation, this will always create a new client such as openai.OpenAI for _register_default_client is called when init a OpenAIWrapper no matter what.

Below is an image cut for my project's agents init profie result, as you can see, it costed up to ~3s in total to initiate all agents when there are tools registered and llm_config provided (even though they are all the same for each agent).

The root cause is load_verify_locations in ssl package which used in httpx under the hood in openai client. Thus it there is a cache mechanism (such as object pool) implemented in client level, it would boost up a lot for agent initialization when one's project using lots of agents and tools at the same time, make it truely possible for product deployment.

PanQiWei commented 2 weeks ago

Here is my simple implementation for caching client, hope it helpful:

import json
import logging
import sys
from hashlib import md5
from typing import Any, Dict
from threading import Lock

from autogen import OpenAIWrapper
from autogen.oai.client import PlaceHolderClient
from flaml.automl.logger import logger_formatter

from omne._types import ThreadLevelSingleton

logger = logging.getLogger(__name__)
if not logger.handlers:
    # Add the console handler.
    _ch = logging.StreamHandler(stream=sys.stdout)
    _ch.setFormatter(logger_formatter)
    logger.addHandler(_ch)

def _config_to_key(config: Dict[str, Any]) -> str:
    return md5(json.dumps(config, sort_keys=True).encode()).hexdigest()

class ClientCache(ThreadLevelSingleton):
    def __init__(self):
        self._client_creation_lock = Lock()

        self._oai_clients = {}
        self._aoai_clients = {}
        self._google_clients = {}

    def _get_client(self, cache: dict, config: Dict[str, Any], client_class: Any):
        key = _config_to_key(config)
        if key not in cache:
            with self._client_creation_lock:
                if key not in cache:
                    cache[key] = client_class(**config)
        return cache[key]

    def create_or_get_oai_client(self, config: Dict[str, Any]):
        from autogen.oai.client import OpenAIClient
        from openai import OpenAI
        return OpenAIClient(client=self._get_client(self._oai_clients, config, OpenAI).copy())

    def create_or_get_aoai_client(self, config: Dict[str, Any]):
        from autogen.oai.client import OpenAIClient
        from openai import AzureOpenAI
        return OpenAIClient(client=self._get_client(self._aoai_clients, config, AzureOpenAI).copy())

    def create_or_get_google_client(self, config: Dict[str, Any]):
        try:
            from autogen.oai.gemini import GeminiClient
        except:
            raise ImportError("Please install `google-generativeai` to use Google OpenAI API.")
        return self._get_client(self._google_clients, config, GeminiClient)

def _register_default_client(self, config: Dict[str, Any], openai_config: Dict[str, Any]) -> None:
    client_cache = ClientCache()

    openai_config = {**openai_config, **{k: v for k, v in config.items() if k in self.openai_kwargs}}
    api_type = config.get("api_type")
    model_client_cls_name = config.get("model_client_cls")
    if model_client_cls_name is not None:
        # a config for a custom client is set
        # adding placeholder until the register_model_client is called with the appropriate class
        self._clients.append(PlaceHolderClient(config))
        logger.info(
            f"Detected custom model client in config: {model_client_cls_name}, model client can not be used until register_model_client is called."
        )
    else:
        if api_type is not None and api_type.startswith("azure"):
            self._configure_azure_openai(config, openai_config)
            self._clients.append(client_cache.create_or_get_aoai_client(openai_config))
        elif api_type is not None and api_type.startswith("google"):
            self._clients.append(client_cache.create_or_get_google_client(openai_config))
        else:
            self._clients.append(client_cache.create_or_get_oai_client(openai_config))

def patch_openai_wrapper():
    OpenAIWrapper._register_default_client = _register_default_client

__all__ = ["patch_openai_wrapper"]

qingyun-wu commented 2 weeks ago

Here is my simple implementation for caching client, hope it helpful:

import json
import logging
import sys
from hashlib import md5
from typing import Any, Dict
from threading import Lock

from autogen import OpenAIWrapper
from autogen.oai.client import PlaceHolderClient
from flaml.automl.logger import logger_formatter

from omne._types import ThreadLevelSingleton

logger = logging.getLogger(__name__)
if not logger.handlers:
    # Add the console handler.
    _ch = logging.StreamHandler(stream=sys.stdout)
    _ch.setFormatter(logger_formatter)
    logger.addHandler(_ch)

def _config_to_key(config: Dict[str, Any]) -> str:
    return md5(json.dumps(config, sort_keys=True).encode()).hexdigest()

class ClientCache(ThreadLevelSingleton):
    def __init__(self):
        self._client_creation_lock = Lock()

        self._oai_clients = {}
        self._aoai_clients = {}
        self._google_clients = {}

    def _get_client(self, cache: dict, config: Dict[str, Any], client_class: Any):
        key = _config_to_key(config)
        if key not in cache:
            with self._client_creation_lock:
                if key not in cache:
                    cache[key] = client_class(**config)
        return cache[key]

    def create_or_get_oai_client(self, config: Dict[str, Any]):
        from autogen.oai.client import OpenAIClient
        from openai import OpenAI
        return OpenAIClient(client=self._get_client(self._oai_clients, config, OpenAI).copy())

    def create_or_get_aoai_client(self, config: Dict[str, Any]):
        from autogen.oai.client import OpenAIClient
        from openai import AzureOpenAI
        return OpenAIClient(client=self._get_client(self._aoai_clients, config, AzureOpenAI).copy())

    def create_or_get_google_client(self, config: Dict[str, Any]):
        try:
            from autogen.oai.gemini import GeminiClient
        except:
            raise ImportError("Please install `google-generativeai` to use Google OpenAI API.")
        return self._get_client(self._google_clients, config, GeminiClient)

def _register_default_client(self, config: Dict[str, Any], openai_config: Dict[str, Any]) -> None:
    client_cache = ClientCache()

    openai_config = {**openai_config, **{k: v for k, v in config.items() if k in self.openai_kwargs}}
    api_type = config.get("api_type")
    model_client_cls_name = config.get("model_client_cls")
    if model_client_cls_name is not None:
        # a config for a custom client is set
        # adding placeholder until the register_model_client is called with the appropriate class
        self._clients.append(PlaceHolderClient(config))
        logger.info(
            f"Detected custom model client in config: {model_client_cls_name}, model client can not be used until register_model_client is called."
        )
    else:
        if api_type is not None and api_type.startswith("azure"):
            self._configure_azure_openai(config, openai_config)
            self._clients.append(client_cache.create_or_get_aoai_client(openai_config))
        elif api_type is not None and api_type.startswith("google"):
            self._clients.append(client_cache.create_or_get_google_client(openai_config))
        else:
            self._clients.append(client_cache.create_or_get_oai_client(openai_config))

def patch_openai_wrapper():
    OpenAIWrapper._register_default_client = _register_default_client

__all__ = ["patch_openai_wrapper"]

Thanks @PanQiWei! This looks great! I wonder if you would like to contribute? Or help to review/test it if we find contributors? We can chat for more details on Discord: https://discord.com/invite/Yb5gwGVkE5. Thank you!

Phodaie commented 2 weeks ago

Regarding Cohere Command R and Command R+ models, I have implemented a basic CohereAgent(ConversableAgent). Just as GPTAssistantAgent I think Cohere model support should be in the form of CoversableAgent extension (not ModelClient). These models have support for parallel and sequential function calling so a single prompt may result in the model calling multiple (dependent) functions/tools in sequence before returning its response.

geoffroy-noel-ddh commented 2 weeks ago

Better support for local models would be appreciated. See issues reported in #2953 . At a minimum indicating in your documentation which examples have been successfully tested with local models. That would save a lot of time for developers new to autogen trying to understand why your examples work so differently once they use something else than OpenAI. If it doesn't work being upfront about the limitations would greatly help. If it works, telling which model it has been successfully tested with would also save users a lot of time & efforts.

I think it is a common problem with many frameworks (e.g. Langchain). There are plenty of tutorials, examples, prompts, etc. designed primarily and often tested exclusively with OpenAI services but assessing whether they work sufficiently with local models (or how to make them work, or if anyone has ever managed to make them work) requires a lot of experimentations, online search, etc. which can quickly go beyond the resources of smaller development teams.

qingyun-wu commented 2 weeks ago

Can we also address this issue in this release: https://github.com/microsoft/autogen/issues/1262 @yiranwu0 , @Hk669, @marklysze Thanks!

scruffynerf commented 2 weeks ago

Add #2929 and #2930

Hk669 commented 2 weeks ago

Add #2929 and #2930

thanks, I think the Anthropic client will close these issues.

scruffynerf commented 2 weeks ago

Instructor clients? Instructor needs to use a custom client even with using OpenAI API because it wraps the calls and enforces the response model so it can re-request multiple times until it succeeds (or fails N times) So while it might be possible to support via a non-client method (you'd have to hooks in multiple places, and reproduce Instructor-ish behavior ala Guidance (but much better than that), allowing using Instructor as a client is much easier.

I have code for this to submit.

scruffynerf commented 2 weeks ago

I also did an Ollama Raw client, though it's not really worth the effort (I did it to do mistral v0.3 tools but it works fine with my 'toolsfortoolless' code without Raw. I'll probably put it someplace regardless, just so it's out there.

garnermccloud commented 2 weeks ago

It might be worth exploring the use of LiteLLM in AutoGen to see if we can offload the non-OpenAI model support to a dedicated library: https://github.com/BerriAI/litellm

Has anyone looked into this yet? Is there functionality specific to AutoGen that isn't supported in LiteLLM?

scruffynerf commented 2 weeks ago

It might be worth exploring the use of LiteLLM in AutoGen to see if we can offload the non-OpenAI model support to a dedicated library...

That doesn't really address the problem of wanting other clients supported directly. Yes, litellm as a wrapper can work for some, and recommending it is fine, but it's not an answer for everyone. We already have some 'tweaks' in various clients to allow adjusting stuff as needed. Using a 'universal wrapper' means you can't tune that way. If it did, we could adjust the OpenAI wrapper we already use in a majority of cases.

I go back and forth between using Ollama and LLMStudio, and I've played with other local servers. Each has pros and cons. Same is true of wrappers like litellm. There are tradeoffs. Litellm as a library gives you a translation from a common format to various specific formats that differ, but you also lose ability to do those tweaks.

OpenAI API support is the most common, but not universal, API, and adding additional APIs with 'in repo' supported clients is a good thing, because there will always be other flavors of API out there.

I wouldn't be opposed to seeing a LiteLLM using client to be clear, I just don't want it to be 'the answer'

marklysze commented 2 weeks ago

I was thinking that this roadmap can cover the cloud-based inference providers.

Separately, I think it would be good to have a local LLM focused blog and associated PRs on a roadmap. That could focus on client classes for the likes of LiteLLM / Ollama / etc. as well as approaches / classes like @scruffynerf's "toolsfortoolless". Local LLMs is an area I started out in and found it frustrating, like @geoffroy-noel-ddh noted, trying to figure out the right LLM for the right setup. If that's something that people want to work on let's create that.

brycecf commented 1 week ago

It might be worth exploring the use of LiteLLM in AutoGen to see if we can offload the non-OpenAI model support to a dedicated library: https://github.com/BerriAI/litellm

Has anyone looked into this yet? Is there functionality specific to AutoGen that isn't supported in LiteLLM?

LiteLLM did not actually resolve the underlying issue that AutoGen is implemented assuming an OpenAI/GPT-style valid conversation flow. LiteLLM just creates an API proxy.

At least prior to the Anthropic PRs. Haven't tested since then to see if it consistently works now (with LiteLLM.)

microsoft / autogen