microsoft / autogen

A programming framework for agentic AI 🤖
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
34.81k stars 5.04k forks source link

[Issue]: ragproxyagent always return ```InvalidCollectionException: Collection autogen-docs does not exist.``` #3551

Closed inoue0426 closed 2 months ago

inoue0426 commented 2 months ago

Describe the issue

I am following this sample https://microsoft.github.io/autogen/blog/2023/10/18/RetrieveChat/ and it always return below error. Could you teach me how to resolve this?

[Sep 20] This also shows the same error. https://microsoft.github.io/autogen/docs/topics/retrieval_augmentation/

ragproxyagent.initiate_chat(assistant, message=ragproxyagent.message_generator, problem="What is autogen?")
Trying to create collection.
---------------------------------------------------------------------------
InvalidCollectionException                Traceback (most recent call last)
Cell In[10], line 1
----> 1 ragproxyagent.initiate_chat(assistant, message=ragproxyagent.message_generator, problem="What is autogen?")

File ~/miniconda3/envs/torch/lib/python3.10/site-packages/autogen/agentchat/conversable_agent.py:1084, in ConversableAgent.initiate_chat(self, recipient, clear_history, silent, cache, max_turns, summary_method, summary_args, message, **kwargs)
   1082 self._prepare_chat(recipient, clear_history)
   1083 if isinstance(message, Callable):
-> 1084     msg2send = message(_chat_info["sender"], _chat_info["recipient"], kwargs)
   1085 else:
   1086     msg2send = self.generate_init_message(message, **kwargs)

File ~/miniconda3/envs/torch/lib/python3.10/site-packages/autogen/agentchat/contrib/retrieve_user_proxy_agent.py:636, in RetrieveUserProxyAgent.message_generator(sender, recipient, context)
    633 n_results = context.get("n_results", 20)
    634 search_string = context.get("search_string", "")
--> 636 sender.retrieve_docs(problem, n_results, search_string)
    637 sender.problem = problem
    638 sender.n_results = n_results

File ~/miniconda3/envs/torch/lib/python3.10/site-packages/autogen/agentchat/contrib/retrieve_user_proxy_agent.py:560, in RetrieveUserProxyAgent.retrieve_docs(self, problem, n_results, search_string)
    558 if not self._collection or not self._get_or_create:
    559     print("Trying to create collection.")
--> 560     self._init_db()
    561     self._collection = True
    562     self._get_or_create = True

File ~/miniconda3/envs/torch/lib/python3.10/site-packages/autogen/agentchat/contrib/retrieve_user_proxy_agent.py:339, in RetrieveUserProxyAgent._init_db(self)
    336 else:
    337     IS_TO_CHUNK = True
--> 339 self._vector_db.active_collection = self._vector_db.create_collection(
    340     self._collection_name, overwrite=self._overwrite, get_or_create=self._get_or_create
    341 )
    343 docs = None
    344 if IS_TO_CHUNK:

File ~/miniconda3/envs/torch/lib/python3.10/site-packages/autogen/agentchat/contrib/vectordb/chromadb.py:86, in ChromaVectorDB.create_collection(self, collection_name, overwrite, get_or_create)
     84         collection = self.active_collection
     85     else:
---> 86         collection = self.client.get_collection(collection_name, embedding_function=self.embedding_function)
     87 except ValueError:
     88     collection = None

File ~/miniconda3/envs/torch/lib/python3.10/site-packages/chromadb/api/client.py:142, in Client.get_collection(self, name, id, embedding_function, data_loader)
    132 @override
    133 def get_collection(
    134     self,
   (...)
    140     data_loader: Optional[DataLoader[Loadable]] = None,
    141 ) -> Collection:
--> 142     model = self._server.get_collection(
    143         id=id,
    144         name=name,
    145         tenant=self.tenant,
    146         database=self.database,
    147     )
    148     return Collection(
    149         client=self._server,
    150         model=model,
    151         embedding_function=embedding_function,
    152         data_loader=data_loader,
    153     )

File ~/miniconda3/envs/torch/lib/python3.10/site-packages/chromadb/telemetry/opentelemetry/__init__.py:146, in trace_method.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
    144 global tracer, granularity
    145 if trace_granularity < granularity:
--> 146     return f(*args, **kwargs)
    147 if not tracer:
    148     return f(*args, **kwargs)

File ~/miniconda3/envs/torch/lib/python3.10/site-packages/chromadb/api/segment.py:251, in SegmentAPI.get_collection(self, name, id, tenant, database)
    249     return existing[0]
    250 else:
--> 251     raise InvalidCollectionException(f"Collection {name} does not exist.")

InvalidCollectionException: Collection autogen-docs does not exist.

Steps to reproduce

Below is the code. This also happened with OPEN AI API.


import autogen
import litellm

from autogen import AssistantAgent
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent

llm_config = {
    "config_list": [
        {
            "model": "NotRequired",  # Loaded with LiteLLM command
            "api_key": "NotRequired",  # Not needed
            "base_url": "http://0.0.0.0:4000/",  # Your LiteLLM URL
            "price": [0, 0],  # Put in price per 1K tokens [prompt, response] as free!
        }
    ],
    "cache_seed": None,  # Turns off caching, useful for testing different models
}

assistant = AssistantAgent(
    name="assistant",
    system_message="You are a helpful assistant.",
    llm_config=llm_config,
)

ragproxyagent = RetrieveUserProxyAgent(
    name="ragproxyagent",
    retrieve_config={
        "task": "qa",
        "docs_path": "https://raw.githubusercontent.com/microsoft/autogen/main/README.md",
    }, 
    code_execution_config=False
)

assistant.reset()
ragproxyagent.initiate_chat(assistant, message=ragproxyagent.message_generator, problem="What is autogen?")

Screenshots and logs

Trying to create collection.
---------------------------------------------------------------------------
InvalidCollectionException                Traceback (most recent call last)
Cell In[10], line 1
----> 1 ragproxyagent.initiate_chat(assistant, message=ragproxyagent.message_generator, problem="What is autogen?")

File ~/miniconda3/envs/torch/lib/python3.10/site-packages/autogen/agentchat/conversable_agent.py:1084, in ConversableAgent.initiate_chat(self, recipient, clear_history, silent, cache, max_turns, summary_method, summary_args, message, **kwargs)
   1082 self._prepare_chat(recipient, clear_history)
   1083 if isinstance(message, Callable):
-> 1084     msg2send = message(_chat_info["sender"], _chat_info["recipient"], kwargs)
   1085 else:
   1086     msg2send = self.generate_init_message(message, **kwargs)

File ~/miniconda3/envs/torch/lib/python3.10/site-packages/autogen/agentchat/contrib/retrieve_user_proxy_agent.py:636, in RetrieveUserProxyAgent.message_generator(sender, recipient, context)
    633 n_results = context.get("n_results", 20)
    634 search_string = context.get("search_string", "")
--> 636 sender.retrieve_docs(problem, n_results, search_string)
    637 sender.problem = problem
    638 sender.n_results = n_results

File ~/miniconda3/envs/torch/lib/python3.10/site-packages/autogen/agentchat/contrib/retrieve_user_proxy_agent.py:560, in RetrieveUserProxyAgent.retrieve_docs(self, problem, n_results, search_string)
    558 if not self._collection or not self._get_or_create:
    559     print("Trying to create collection.")
--> 560     self._init_db()
    561     self._collection = True
    562     self._get_or_create = True

File ~/miniconda3/envs/torch/lib/python3.10/site-packages/autogen/agentchat/contrib/retrieve_user_proxy_agent.py:339, in RetrieveUserProxyAgent._init_db(self)
    336 else:
    337     IS_TO_CHUNK = True
--> 339 self._vector_db.active_collection = self._vector_db.create_collection(
    340     self._collection_name, overwrite=self._overwrite, get_or_create=self._get_or_create
    341 )
    343 docs = None
    344 if IS_TO_CHUNK:

File ~/miniconda3/envs/torch/lib/python3.10/site-packages/autogen/agentchat/contrib/vectordb/chromadb.py:86, in ChromaVectorDB.create_collection(self, collection_name, overwrite, get_or_create)
     84         collection = self.active_collection
     85     else:
---> 86         collection = self.client.get_collection(collection_name, embedding_function=self.embedding_function)
     87 except ValueError:
     88     collection = None

File ~/miniconda3/envs/torch/lib/python3.10/site-packages/chromadb/api/client.py:142, in Client.get_collection(self, name, id, embedding_function, data_loader)
    132 @override
    133 def get_collection(
    134     self,
   (...)
    140     data_loader: Optional[DataLoader[Loadable]] = None,
    141 ) -> Collection:
--> 142     model = self._server.get_collection(
    143         id=id,
    144         name=name,
    145         tenant=self.tenant,
    146         database=self.database,
    147     )
    148     return Collection(
    149         client=self._server,
    150         model=model,
    151         embedding_function=embedding_function,
    152         data_loader=data_loader,
    153     )

File ~/miniconda3/envs/torch/lib/python3.10/site-packages/chromadb/telemetry/opentelemetry/__init__.py:146, in trace_method.<locals>.decorator.<locals>.wrapper(*args, **kwargs)
    144 global tracer, granularity
    145 if trace_granularity < granularity:
--> 146     return f(*args, **kwargs)
    147 if not tracer:
    148     return f(*args, **kwargs)

File ~/miniconda3/envs/torch/lib/python3.10/site-packages/chromadb/api/segment.py:251, in SegmentAPI.get_collection(self, name, id, tenant, database)
    249     return existing[0]
    250 else:
--> 251     raise InvalidCollectionException(f"Collection {name} does not exist.")

InvalidCollectionException: Collection autogen-docs does not exist.

Additional Information

pyautogen 0.2.34
Python    3.10.0
macOS.    13.2.1 (22D68)
vijaygill commented 2 months ago

@inoue0426 - Coincidently I was trying the same (autogen RAG) around the time you posted the issue and I faced same issue too. I sorted it out by kinda hacky way by creating the collection using chromadb client. Code snippets shown below:

CHROMA_DB_PATH="/app/tmp/chromadb"
CHROMA_COLLECTION="autogen-docs-test"

chroma_client = chromadb.PersistentClient(path=CHROMA_DB_PATH)
collection = chroma_client.get_or_create_collection(name=CHROMA_COLLECTION)

ollama_ef = embedding_functions.OllamaEmbeddingFunction(
        url="http://<my local ollama host and port>/api/embeddings",
        model_name="mxbai-embed-large",
        )
vector_db = ChromaVectorDB(path=CHROMA_DB_PATH, embedding_function = ollama_ef)

and usage of db path and collection name while creating agent

ragproxyagent = RetrieveUserProxyAgent(
    name="ragproxyagent",
    human_input_mode="NEVER",
    llm_config=llm_config,
    code_execution_config=False,
    retrieve_config={
        "model": config_list[0]["model"],
        "task": "qa",
        "update_context": True,
        "n_results": 3,
        "docs_path":[
            "./qa.txt",
            ],
       "get_or_create": True,
       "overwrite": False,
       "vector_db": vector_db,
       "collection_name": CHROMA_COLLECTION,
       "embedding_function": ollama_ef,
    },
)
inoue0426 commented 2 months ago

@vijaygill This works! Thanks.

thinkall commented 2 months ago

Hi @inoue0426 , @vijaygill ,thanks a lot for reporting the issue. It caused by an error type change inside chromadb. For now, you can also downgrade chromadb to <=0.5.0 (maybe some versions >0.5.0 will work, but I didn't try all of them). I'll raise a PR to fix it.

vijaygill commented 2 months ago

@thinkall - Thanks! I am working on a POC application so this hack is working for me and I will revisit this part later. May be a new version with your PR might be already available by then! Thanks for all the great work!

inoue0426 commented 2 months ago

@thinkall Thanks! I will try to use it!