kyegomez / swarms

The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.com/servers/agora-999382051935506503
https://docs.swarms.world
Other
896 stars 119 forks source link

[BUG] Getting error "local variable 'out' referenced before assignment" when using ChromaDB long-term memory. #520

Open aimzieslol opened 1 week ago

aimzieslol commented 1 week ago

Using the examples I created a custom ChromaDB that looks like this:

class ChromaMemory(BaseVectorDatabase):
    def __init__(self, metric: str = "cosine", output_dir: str = "swarms", \
                 limit_tokens: Optional[int] = 1000, n_results: int = 1, \
                 docs_folder: str = None, verbose: bool = True, \
                 *args, **kwargs):
        self.metric = metric
        self.output_dir = output_dir
        self.limit_tokens = limit_tokens
        self.n_results = n_results
        self.docs_folder = docs_folder
        self.verbose = verbose

        # Disable ChromaDB logging
        if verbose:
            logging.getLogger("chromadb").setLevel(logging.INFO)

        # Create Chroma collection
        chroma_persist_dir = "./chromadb"
        chroma_client = chromadb.PersistentClient(
            settings=chromadb.config.Settings(
                persist_directory=chroma_persist_dir,
            ),
            *args,
            **kwargs,
        )

        # Create ChromaDB client
        self.client = chromadb.Client()

        # Create Chroma collection
        self.collection = chroma_client.get_or_create_collection(
            name=output_dir,
            metadata={"hnsw:space": metric},
            embedding_function=get_embedding_func(),
            *args,
            **kwargs,
        )

        display_markdown_message(
            "ChromaDB collection created:"
            f" {self.collection.name} with metric: {self.metric} and"
            f" output directory: {self.output_dir}"
        )

        # If docs
        if docs_folder:
            display_markdown_message(
                f"Traversing directory: {docs_folder}"
            )
            self.traverse_directory()

    def add(self, document: str, *args, **kwargs):
        """
        Add a document to the ChromaDB collection.

        Args:
            document (str): The document to be added.
            condition (bool, optional): The condition to check before adding the document. Defaults to True.

        Returns:
            str: The ID of the added document.
        """
        try:
            doc_id = str(uuid.uuid4())
            self.collection.add(ids=[doc_id], documents=[document], *args, **kwargs)
            print("-----------------")
            print("Document added successfully")
            print("-----------------")
            return doc_id
        except Exception as e:
            raise Exception(f"Failed to add document: {str(e)}")

    def query(self, query_text: str, *args, **kwargs) -> str:
        """
        Query documents from the ChromaDB collection.

        Args:
            query (str): The query string.
            n_docs (int, optional): The number of documents to retrieve. Defaults to 1.

        Returns:
            dict: The retrieved documents.
        """
        try:
            logging.info(f"Querying documents for: {query_text}")

            docs = self.collection.query(query_texts=[query_text], n_results=self.n_results, *args, **kwargs)["documents"]

            # Convert into a string
            # out = ""
            # for doc in docs:
            #     out += f"{doc}\n"
            out = "\n".join([x for doc in docs for x in doc])

            # Display the retrieved document
            display_markdown_message(f"Query: {query_text}")
            display_markdown_message(f"Retrieved Document: {out}")
            display_markdown_message(f"Retrieved Document: {type(out)}")

            return out
        except Exception as e:
            raise Exception(f"Failed to query documents: {str(e)}")

    def traverse_directory(self):
        """
        Traverse through every file in the given directory and its subdirectories,
        and return the paths of all files.
        Parameters:
        - directory_name (str): The name of the directory to traverse.
        Returns:
        - list: A list of paths to each file in the directory and its subdirectories.
        """
        added_to_db = False

        for root, dirs, files in os.walk(self.docs_folder):
            for file in files:
                file_path = os.path.join(root, file)  # Change this line
                _, ext = os.path.splitext(file_path)
                data = data_to_text(file_path)
                added_to_db = self.add(str(data))
                print(f"{file_path} added to Database")

        return added_to_db

My agent setup looks like this:

memory = ChromaMemory(metric="cosine", n_results=3)

agent = Agent(
    agent_name="chat-tester",
    agent_description=("This agent chats with the user ... nicely."),
    llm=get_groq_llm(),
    max_loops=1,
    autosave=True,
    verbose=True,
    long_term_memory=memory,
    stopping_condition="finish",
)

When I attempt to run the agent with this command (to have it do something simple/stupid to see how memory works:

agent.run('write a linkedin and twitter post about machine learning. can only be 240 chars max.')

it seems like most of it works correctly but it complains about the out variable:

2024-06-29T23:31:54.369985+0000 Autonomous Agent Activated.
2024-06-29T23:31:54.375412+0000 All systems operational. Executing task...
2024-06-29T23:31:54.385845+0000 Tokens available: -8082
2024-06-29T23:31:54.389028+0000 Querying long term memory database for write a linkedin and twitter post about machine learning. can only be 240 chars max.
Number of tokens: 110

Loop 1 of 1

2024-06-29T23:31:56.376794+0000 Couting tokens of retrieved document
2024-06-29T23:31:56.379307+0000 Retrieved document token count 0
2024-06-29T23:31:56.385940+0000 Error querying long term memory: local variable 'out' referenced before assignment
2024-06-29T23:31:56.388225+0000 Attempt 1: Error generating response: local variable 'out' referenced before assignment
2024-06-29T23:31:56.389074+0000 Querying long term memory database for write a linkedin and twitter post about machine learning. can only be 240 chars max.
Query: write a linkedin and twitter post about machine learning. can only be 240 chars max.
Retrieved Document:
Retrieved Document: <class 'str'>
Number of tokens: 0
2024-06-29T23:31:56.810587+0000 Couting tokens of retrieved document
2024-06-29T23:31:56.816323+0000 Retrieved document token count 0
2024-06-29T23:31:56.818167+0000 Error querying long term memory: local variable 'out' referenced before assignment
2024-06-29T23:31:56.820479+0000 Attempt 2: Error generating response: local variable 'out' referenced before assignment
2024-06-29T23:31:56.822279+0000 Querying long term memory database for write a linkedin and twitter post about machine learning. can only be 240 chars max.
Query: write a linkedin and twitter post about machine learning. can only be 240 chars max.
Retrieved Document:
Retrieved Document: <class 'str'>
Number of tokens: 0
2024-06-29T23:31:57.215569+0000 Couting tokens of retrieved document
2024-06-29T23:31:57.220205+0000 Retrieved document token count 0
2024-06-29T23:31:57.221670+0000 Error querying long term memory: local variable 'out' referenced before assignment
2024-06-29T23:31:57.223879+0000 Attempt 3: Error generating response: local variable 'out' referenced before assignment
Query: write a linkedin and twitter post about machine learning. can only be 240 chars max.
Retrieved Document:
Retrieved Document: <class 'str'>
Number of tokens: 0
Saved agent state to: chat-tester_state.json
2024-06-29T23:31:57.224610+0000 Failed to generate a valid response after retry attempts.
2024-06-29T23:31:57.225140+0000 Autosaving agent state.
2024-06-29T23:31:57.227175+0000 Saving Agent chat-tester state to: chat-tester_state.json

Normal client operations work when I invoke the client from memory, i.e. I'm able to add, retrieve, etc.

Also, the only out variable is in the ChromaMemory class and it works just fine.

Lastly, it could be that I'm using this completely wrong so just tell me if long-term memory is only for RAG operations ... ?

Upvote & Fund

Fund with Polar

github-actions[bot] commented 1 week ago

Hello there, thank you for opening an Issue ! 🙏🏻 The team was notified and they will get back to you asap.

aimzieslol commented 1 week ago

BTW, when I make a "dumb" agent and run() the same thing, it works just fine:

dumb_agent = Agent(agent_name='dumb-tester',
                   agent_description=('does dumb things'),
                   llm=get_groq_llm(),
                   max_loops=1,
                   verbose=False)