microsoft / autogen

A programming framework for agentic AI 🤖
https://microsoft.github.io/autogen/
Creative Commons Attribution 4.0 International
34.47k stars 4.98k forks source link

RAG #81

Closed bruinon closed 1 year ago

bruinon commented 1 year ago

Folks,

I am testing AutoGen for Retrieval Augmented Code Generation and Question Answering by following the notebook posted here, https://github.com/microsoft/FLAML/blob/main/notebook/autogen_agentchat_RetrieveChat.ipynb

I have a few follow-up questions,

  1. How do i specify which embedding function, e.g., openai.Embedding or HuggingFaceEmbedding, can be used in RetrieveUserProxyAgent?
  2. how do i configure the CharacterTextSplitter/RecursiveCharacterTextSplitter in RetrieveUserProxyAgent ?
  3. I encountered this error when i was going through the example notebook.
    
    File ~/venv/GPT_venv/lib/python3.10/site-packages/autogen/retrieve_utils.py:220, in create_vector_db_from_dir(dir_path, max_tokens, client, db_path, collection_name, get_or_create, chunk_mode, must_break_at_empty_line, embedding_model)
    212     for i in range(0, len(chunks), 40000):
    213         collection.upsert(
    214             documents=chunks[
    215                 i : i + 40000
    216             ],  # we handle tokenization, embedding, and indexing automatically. You can skip that and add your own embeddings as well
    217             ids=[f"doc_{i}" for i in range(i, i + 40000)],  # unique for each doc
    218         )
    219     collection.upsert(
    --> 220         documents=chunks[i : len(chunks)],
    221         ids=[f"doc_{i}" for i in range(i, len(chunks))],  # unique for each doc
    222     )
    223 except ValueError as e:
    224     logger.warning(f"{e}")

UnboundLocalError: local variable 'i' referenced before assignment

sonichi commented 1 year ago

@thinkall

thinkall commented 1 year ago

Hi @bruinon , which version of autogen have you installed? Could you upgrade to the latest version and try this notebook https://github.com/microsoft/autogen/blob/main/notebook/agentchat_RetrieveChat.ipynb?

thinkall commented 1 year ago

For your question 1 and 2, we haven't implemented them yet. Would you like to raise a PR for that?

bruinon commented 1 year ago

UnboundLocalError: local variable 'i' referenced before assignment

Thanks @thinkall for the tips, upgrading from 0.1.3 to 0.1.5 resolved the problem;

Installing collected packages: pyautogen
  Attempting uninstall: pyautogen
    Found existing installation: pyautogen 0.1.3
    Uninstalling pyautogen-0.1.3:
      Successfully uninstalled pyautogen-0.1.3
Successfully installed pyautogen-0.1.5

Do you have insights on the previous two questions? Also in AutoGen framework, is there a way for me to update the metadata for each embedding chunk before inserting into Chroma? this feature is important to identify chunks associated with outdated documents, so that they can be replaced with new embedding chunks.

thinkall commented 1 year ago

UnboundLocalError: local variable 'i' referenced before assignment

Thanks @thinkall for the tips, upgrading from 0.1.3 to 0.1.5 resolved the problem;

Installing collected packages: pyautogen
  Attempting uninstall: pyautogen
    Found existing installation: pyautogen 0.1.3
    Uninstalling pyautogen-0.1.3:
      Successfully uninstalled pyautogen-0.1.3
Successfully installed pyautogen-0.1.5

Do you have insights on the previous two questions? Also in AutoGen framework, is there a way for me to update the metadata for each embedding chunk before inserting into Chroma? this feature is important to identify chunks associated with outdated documents, so that they can be replaced with new embedding chunks.

Adding docs with given doc ids is also doable, but sorry we haven't implemented that yet.

bruinon commented 1 year ago

For your question 1 and 2, we haven't implemented them yet. Would you like to raise a PR for that?

Yes, I would like to raise a PR for that! We would avoid the problems of mixing multiple LLM frameworks in a single application

thinkall commented 1 year ago

Thank you very much @bruinon. Let me know if you need any assistance on the PR.

FacundoMartinezCampos commented 1 year ago

UnboundLocalError: local variable 'i' referenced before assignment

Thanks @thinkall for the tips, upgrading from 0.1.3 to 0.1.5 resolved the problem;

Installing collected packages: pyautogen
  Attempting uninstall: pyautogen
    Found existing installation: pyautogen 0.1.3
    Uninstalling pyautogen-0.1.3:
      Successfully uninstalled pyautogen-0.1.3
Successfully installed pyautogen-0.1.5

Do you have insights on the previous two questions? Also in AutoGen framework, is there a way for me to update the metadata for each embedding chunk before inserting into Chroma? this feature is important to identify chunks associated with outdated documents, so that they can be replaced with new embedding chunks.

Hello, I am having the same issue, I need add metadata in each embedding in order to filter them, were you able to do it?