run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.21k stars 5.16k forks source link

[Bug]: chat engine with chat memory doesn't seek information from vector store #10361

Closed patrickporto closed 8 months ago

patrickporto commented 8 months ago

Bug Description

Hello

I am developing a chat engine using Redis and PG Vector; however, the contextual information is left out when I put the Redis as chat memory:


OPENAI_EMBEDDING_DIMENSION = 1536

embed_model = OpenAIEmbedding()

vector_store = PGVectorStore.from_params(
    database=settings.VECTOR_DB_URI.database,
    host=settings.VECTOR_DB_URI.host,
    password=settings.VECTOR_DB_URI.password,
    port=settings.VECTOR_DB_URI.port,
    user=settings.VECTOR_DB_URI.user,
    table_name="assistant_embedding",
    embed_dim=OPENAI_EMBEDDING_DIMENSION,
)

service_context = ServiceContext.from_defaults(embed_model=embed_model)

index = VectorStoreIndex.from_vector_store(
    vector_store=vector_store,
    service_context=service_context,
)

chat_store = RedisChatStore(redis_url=settings.REDIS_CACHE_URI, ttl=60 * 60 * 24 * 7)

chat_memory = ChatMemoryBuffer.from_defaults(
    token_limit=3000,
    chat_store=chat_store,
    chat_store_key='session_id',
)
chat_engine = index.as_chat_engine(
    chat_mode=ChatMode.CONTEXT,
    memory=chat_memory
)

Version

0.9.39

Steps to Reproduce

  1. Train an information about a person.
  2. You should use the previous code as a base and then ask for the previously trained information about the person.

Relevant Logs/Tracbacks

No response

dosubot[bot] commented 8 months ago

πŸ€–

Hey there @patrickporto! Good to cross paths with you again in the matrix of coding issues. I hope all's well in your code universe! πŸ‘Ύ

Based on the code you've provided, it seems like you're correctly setting up the ChatEngine with RedisChatStore as the chat memory. However, the issue you're experiencing might be due to the way the chat memory interacts with the vector store.

The ChatMemoryBuffer class, which you're using as the chat memory, is designed to store the recent history of a chat session. When a new message is added to the chat, it's stored in the memory buffer and the oldest message is removed if the buffer's token limit is exceeded. The ChatMemoryBuffer doesn't interact directly with the vector store - it's the responsibility of the ChatEngine to use the chat memory and the vector store to generate responses.

The ChatEngine uses the chat memory to provide context for the current conversation. When generating a response, it retrieves the recent chat history from the chat memory and uses this as the context for the GPT model. The ChatEngine doesn't use the vector store to retrieve contextual information - the vector store is used to store and retrieve embeddings for the GPT model.

If you're trying to retrieve information about a person that was trained and stored in the vector store, you'll need to use the VectorStoreIndex directly. The VectorStoreIndex provides methods for querying the vector store and retrieving embeddings. Here's an example of how you might do this:

# Query the vector store for a person's information
query_embedding = embed_model.embed_text("information about person")
results = index.query(query_embedding, top_k=1)

# The results are a list of tuples, where each tuple contains an embedding and its metadata
for embedding, metadata in results:
    print(metadata)

This will print the metadata for the closest matching embedding in the vector store, which should include the information about the person.

If you're still experiencing issues, could you please provide more details about the specific problem? For example, are there any error messages? Is the chat engine not responding at all, or is it just not providing the expected information? Any additional information would be helpful in diagnosing the issue.

Sources:

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a πŸ‘ if this is helpful and πŸ‘Ž if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

logan-markewich commented 8 months ago

This particular chat mode calls retrieve() on every user message and inserts the retrieved context into the system prompt

It can then respond using the retrieved context, or without it.

Maybe you'd want to modify the system prompt with additional instructions

chat_engine = index.as_chat_engine(
    chat_mode="context",
    memory=memory,
    system_prompt=(
        "You are a chatbot, able to have normal interactions, as well as talk"
        " about an essay discussing Paul Grahams life."
    ),
)

Info on other chat modes is here https://docs.llamaindex.ai/en/stable/module_guides/deploying/chat_engines/usage_pattern.html#available-chat-modes

patrickporto commented 8 months ago

@logan-markewich context chat mode is not the problem because without the memory parameter it returns the expected response, such as the following example:

chat_engine = index.as_chat_engine(
    chat_mode=ChatMode.CONTEXT,
    # memory=chat_memory We need to comment this to the engine give me any response from the vector store
)

That chat engine does not return any information from the vector store when uncommenting the line about chat memory because it returns information from the conversation history only.

logan-markewich commented 8 months ago

@patrickporto what does your service context look like?

I find the behavior to not be reproducible for me