Open qingyun-wu opened 3 weeks ago
thanks @qingyun-wu.
may i suggest 01 Yi model familly apis , i just got the docs, and this was actually on my list : to bring it to autogen i mean :-) openai drop in compatible already.
may i suggest 01 Yi model familly apis , i just got the docs, and this was actually on my list : to bring it to autogen i mean :-) openai drop in compatible already.
Good ideas! Let's see if we can find volunteers to add those!
Hi, I find this roadmap is mainly about clients implementation. Here is an optimization advice I would very appreciate if you can also implement in 0.2.30 ๐
Implement an object pool to cache clients to avoid instantiate client with the same key repeatly.
This is really influence agent's init speed when there are tones of agents and tools. Especially for tools/functions, everytime a tool is registered to a llm, a new OpenAIWrapper
is created to update config, that's ok if only request config (payload) is updated, however, in current implementation, this will always create a new client such as openai.OpenAI
for _register_default_client
is called when init a OpenAIWrapper
no matter what.
Below is an image cut for my project's agents init profie result, as you can see, it costed up to ~3s in total to initiate all agents when there are tools registered and llm_config provided (even though they are all the same for each agent).
The root cause is load_verify_locations
in ssl package which used in httpx
under the hood in openai client. Thus it there is a cache mechanism (such as object pool) implemented in client level, it would boost up a lot for agent initialization when one's project using lots of agents and tools at the same time, make it truely possible for product deployment.
Here is my simple implementation for caching client, hope it helpful:
import json
import logging
import sys
from hashlib import md5
from typing import Any, Dict
from threading import Lock
from autogen import OpenAIWrapper
from autogen.oai.client import PlaceHolderClient
from flaml.automl.logger import logger_formatter
from omne._types import ThreadLevelSingleton
logger = logging.getLogger(__name__)
if not logger.handlers:
# Add the console handler.
_ch = logging.StreamHandler(stream=sys.stdout)
_ch.setFormatter(logger_formatter)
logger.addHandler(_ch)
def _config_to_key(config: Dict[str, Any]) -> str:
return md5(json.dumps(config, sort_keys=True).encode()).hexdigest()
class ClientCache(ThreadLevelSingleton):
def __init__(self):
self._client_creation_lock = Lock()
self._oai_clients = {}
self._aoai_clients = {}
self._google_clients = {}
def _get_client(self, cache: dict, config: Dict[str, Any], client_class: Any):
key = _config_to_key(config)
if key not in cache:
with self._client_creation_lock:
if key not in cache:
cache[key] = client_class(**config)
return cache[key]
def create_or_get_oai_client(self, config: Dict[str, Any]):
from autogen.oai.client import OpenAIClient
from openai import OpenAI
return OpenAIClient(client=self._get_client(self._oai_clients, config, OpenAI).copy())
def create_or_get_aoai_client(self, config: Dict[str, Any]):
from autogen.oai.client import OpenAIClient
from openai import AzureOpenAI
return OpenAIClient(client=self._get_client(self._aoai_clients, config, AzureOpenAI).copy())
def create_or_get_google_client(self, config: Dict[str, Any]):
try:
from autogen.oai.gemini import GeminiClient
except:
raise ImportError("Please install `google-generativeai` to use Google OpenAI API.")
return self._get_client(self._google_clients, config, GeminiClient)
def _register_default_client(self, config: Dict[str, Any], openai_config: Dict[str, Any]) -> None:
client_cache = ClientCache()
openai_config = {**openai_config, **{k: v for k, v in config.items() if k in self.openai_kwargs}}
api_type = config.get("api_type")
model_client_cls_name = config.get("model_client_cls")
if model_client_cls_name is not None:
# a config for a custom client is set
# adding placeholder until the register_model_client is called with the appropriate class
self._clients.append(PlaceHolderClient(config))
logger.info(
f"Detected custom model client in config: {model_client_cls_name}, model client can not be used until register_model_client is called."
)
else:
if api_type is not None and api_type.startswith("azure"):
self._configure_azure_openai(config, openai_config)
self._clients.append(client_cache.create_or_get_aoai_client(openai_config))
elif api_type is not None and api_type.startswith("google"):
self._clients.append(client_cache.create_or_get_google_client(openai_config))
else:
self._clients.append(client_cache.create_or_get_oai_client(openai_config))
def patch_openai_wrapper():
OpenAIWrapper._register_default_client = _register_default_client
__all__ = ["patch_openai_wrapper"]
Here is my simple implementation for caching client, hope it helpful:
import json import logging import sys from hashlib import md5 from typing import Any, Dict from threading import Lock from autogen import OpenAIWrapper from autogen.oai.client import PlaceHolderClient from flaml.automl.logger import logger_formatter from omne._types import ThreadLevelSingleton logger = logging.getLogger(__name__) if not logger.handlers: # Add the console handler. _ch = logging.StreamHandler(stream=sys.stdout) _ch.setFormatter(logger_formatter) logger.addHandler(_ch) def _config_to_key(config: Dict[str, Any]) -> str: return md5(json.dumps(config, sort_keys=True).encode()).hexdigest() class ClientCache(ThreadLevelSingleton): def __init__(self): self._client_creation_lock = Lock() self._oai_clients = {} self._aoai_clients = {} self._google_clients = {} def _get_client(self, cache: dict, config: Dict[str, Any], client_class: Any): key = _config_to_key(config) if key not in cache: with self._client_creation_lock: if key not in cache: cache[key] = client_class(**config) return cache[key] def create_or_get_oai_client(self, config: Dict[str, Any]): from autogen.oai.client import OpenAIClient from openai import OpenAI return OpenAIClient(client=self._get_client(self._oai_clients, config, OpenAI).copy()) def create_or_get_aoai_client(self, config: Dict[str, Any]): from autogen.oai.client import OpenAIClient from openai import AzureOpenAI return OpenAIClient(client=self._get_client(self._aoai_clients, config, AzureOpenAI).copy()) def create_or_get_google_client(self, config: Dict[str, Any]): try: from autogen.oai.gemini import GeminiClient except: raise ImportError("Please install `google-generativeai` to use Google OpenAI API.") return self._get_client(self._google_clients, config, GeminiClient) def _register_default_client(self, config: Dict[str, Any], openai_config: Dict[str, Any]) -> None: client_cache = ClientCache() openai_config = {**openai_config, **{k: v for k, v in config.items() if k in self.openai_kwargs}} api_type = config.get("api_type") model_client_cls_name = config.get("model_client_cls") if model_client_cls_name is not None: # a config for a custom client is set # adding placeholder until the register_model_client is called with the appropriate class self._clients.append(PlaceHolderClient(config)) logger.info( f"Detected custom model client in config: {model_client_cls_name}, model client can not be used until register_model_client is called." ) else: if api_type is not None and api_type.startswith("azure"): self._configure_azure_openai(config, openai_config) self._clients.append(client_cache.create_or_get_aoai_client(openai_config)) elif api_type is not None and api_type.startswith("google"): self._clients.append(client_cache.create_or_get_google_client(openai_config)) else: self._clients.append(client_cache.create_or_get_oai_client(openai_config)) def patch_openai_wrapper(): OpenAIWrapper._register_default_client = _register_default_client __all__ = ["patch_openai_wrapper"]
Thanks @PanQiWei! This looks great! I wonder if you would like to contribute? Or help to review/test it if we find contributors? We can chat for more details on Discord: https://discord.com/invite/Yb5gwGVkE5. Thank you!
Regarding Cohere Command R and Command R+ models, I have implemented a basic CohereAgent(ConversableAgent). Just as GPTAssistantAgent I think Cohere model support should be in the form of CoversableAgent extension (not ModelClient). These models have support for parallel and sequential function calling so a single prompt may result in the model calling multiple (dependent) functions/tools in sequence before returning its response.
Better support for local models would be appreciated. See issues reported in #2953 . At a minimum indicating in your documentation which examples have been successfully tested with local models. That would save a lot of time for developers new to autogen trying to understand why your examples work so differently once they use something else than OpenAI. If it doesn't work being upfront about the limitations would greatly help. If it works, telling which model it has been successfully tested with would also save users a lot of time & efforts.
I think it is a common problem with many frameworks (e.g. Langchain). There are plenty of tutorials, examples, prompts, etc. designed primarily and often tested exclusively with OpenAI services but assessing whether they work sufficiently with local models (or how to make them work, or if anyone has ever managed to make them work) requires a lot of experimentations, online search, etc. which can quickly go beyond the resources of smaller development teams.
Can we also address this issue in this release: https://github.com/microsoft/autogen/issues/1262 @yiranwu0 , @Hk669, @marklysze Thanks!
Add #2929 and #2930
Add #2929 and #2930
thanks, I think the Anthropic client will close these issues.
Instructor clients? Instructor needs to use a custom client even with using OpenAI API because it wraps the calls and enforces the response model so it can re-request multiple times until it succeeds (or fails N times) So while it might be possible to support via a non-client method (you'd have to hooks in multiple places, and reproduce Instructor-ish behavior ala Guidance (but much better than that), allowing using Instructor as a client is much easier.
I have code for this to submit.
I also did an Ollama Raw client, though it's not really worth the effort (I did it to do mistral v0.3 tools but it works fine with my 'toolsfortoolless' code without Raw. I'll probably put it someplace regardless, just so it's out there.
It might be worth exploring the use of LiteLLM in AutoGen to see if we can offload the non-OpenAI model support to a dedicated library: https://github.com/BerriAI/litellm
Has anyone looked into this yet? Is there functionality specific to AutoGen that isn't supported in LiteLLM?
It might be worth exploring the use of LiteLLM in AutoGen to see if we can offload the non-OpenAI model support to a dedicated library...
That doesn't really address the problem of wanting other clients supported directly. Yes, litellm as a wrapper can work for some, and recommending it is fine, but it's not an answer for everyone. We already have some 'tweaks' in various clients to allow adjusting stuff as needed. Using a 'universal wrapper' means you can't tune that way. If it did, we could adjust the OpenAI wrapper we already use in a majority of cases.
I go back and forth between using Ollama and LLMStudio, and I've played with other local servers. Each has pros and cons. Same is true of wrappers like litellm. There are tradeoffs. Litellm as a library gives you a translation from a common format to various specific formats that differ, but you also lose ability to do those tweaks.
OpenAI API support is the most common, but not universal, API, and adding additional APIs with 'in repo' supported clients is a good thing, because there will always be other flavors of API out there.
I wouldn't be opposed to seeing a LiteLLM using client to be clear, I just don't want it to be 'the answer'
I was thinking that this roadmap can cover the cloud-based inference providers.
Separately, I think it would be good to have a local LLM focused blog and associated PRs on a roadmap. That could focus on client classes for the likes of LiteLLM / Ollama / etc. as well as approaches / classes like @scruffynerf's "toolsfortoolless". Local LLMs is an area I started out in and found it frustrating, like @geoffroy-noel-ddh noted, trying to figure out the right LLM for the right setup. If that's something that people want to work on let's create that.
It might be worth exploring the use of LiteLLM in AutoGen to see if we can offload the non-OpenAI model support to a dedicated library: https://github.com/BerriAI/litellm
Has anyone looked into this yet? Is there functionality specific to AutoGen that isn't supported in LiteLLM?
LiteLLM did not actually resolve the underlying issue that AutoGen is implemented assuming an OpenAI/GPT-style valid conversation flow. LiteLLM just creates an API proxy.
At least prior to the Anthropic PRs. Haven't tested since then to see if it consistently works now (with LiteLLM.)
One major milestone in release v0.2.30-v0.2.32 will be enhanced support of non-OpenAI models.
Plan for the next release v0.2.32:
@marklysze, @Hk669, @yiranwu0, feel free to add tasks to the task list.
๐กFeel free to suggest any other pressing features you want to see,/which issues you hope to be addressed, or/which PRs to be merged in the next release!
And let's make it happen together ๐๐ช!
Finished task in release v0.2.30 and v0.2.31