run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.91k stars 5.29k forks source link

[Question]: Chat engine goes to internet to provide answer. How to restrict it strictly to the vector space? #14870

Open SiddharthMurjani opened 4 months ago

SiddharthMurjani commented 4 months ago

Question Validation

Question

Hi @logan-markewich,

I want to restrict my chatbot strictly to ingested vector space, how can I achieve this?

I have tried using response response_synthesizer and chat_mode

response_synthesizer = get_response_synthesizer(response_mode=ResponseMode.SIMPLE_SUMMARIZE,
                                                    structured_answer_filtering=True,
                                                    verbose=True,
                                                    text_qa_template=DEFAULT_TEXT_QA_PROMPT_TMPL)

    chat_engine = vector_index.as_chat_engine(
        chat_mode=ChatMode.CONTEXT,
        memory=memory,
        chat_store=chat_store,
        node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.81)],
        response_synthesizer=response_synthesizer,
    )

Can you please assist me here?

dosubot[bot] commented 4 months ago

To restrict your chatbot strictly to the ingested vector space and prevent it from going to the internet to provide answers, you can customize the text_qa_template and refine_template to provide specific instructions. This ensures that the chatbot only responds based on the context of the ingested documents.

Here is an example of how you might customize the templates:

text_qa_template = """
You are a helpful assistant. Answer the question based on the provided context. If the answer is not in the context, respond with "I don't have that information."

Context: {context}
Question: {question}
Answer:
"""

refine_template = """
You are a helpful assistant. Refine the following answer based on the provided context. If the answer is not in the context, respond with "I don't have that information."

Existing Answer: {existing_answer}
Context: {context}
Refined Answer:
"""

# Use these templates in your LLMChain or other relevant components
question_generator = LLMChain(
    llm=question_gen_llm, 
    prompt=text_qa_template, 
    callback_manager=manager
)

doc_chain = load_qa_chain(
    streaming_llm, 
    chain_type="stuff", 
    prompt=refine_template, 
    callback_manager=manager
)

This setup ensures that the chatbot's responses are strictly based on the ingested vector space and do not rely on external internet sources. Additionally, you can further refine the chatbot's behavior by integrating it with a vector store and using a retrieval-based approach, as shown in the example setup for integrating LlamaIndex with LangChain:

def get_chain(vectorstore: VectorStore, question_handler, stream_handler) -> ConversationalRetrievalChain:
    manager = AsyncCallbackManager([])
    question_manager = AsyncCallbackManager([question_handler])
    stream_manager = AsyncCallbackManager([stream_handler])

    question_gen_llm = OpenAI(
        temperature=0,
        verbose=True,
        callback_manager=question_manager,
    )
    streaming_llm = OpenAI(
        streaming=True,
        callback_manager=stream_manager,
        verbose=True,
        temperature=0,
    )

    question_generator = LLMChain(
        llm=question_gen_llm, prompt=text_qa_template, callback_manager=manager
    )
    doc_chain = load_qa_chain(
        streaming_llm, chain_type="stuff", prompt=refine_template, callback_manager=manager
    )

    qa = ConversationalRetrievalChain(
        retriever=vectorstore.as_retriever(),
        combine_docs_chain=doc_chain,
        question_generator=question_generator,
        callback_manager=manager,
    )
    return qa

By following these steps, you can ensure that your chatbot's responses are confined to the ingested vector space and do not include information from external sources [1][2][3].

To continue talking to Dosu, mention @dosu.

SiddharthMurjani commented 4 months ago

I believe llama-index does not make use of LLMChain and ConversationalRetrievalChain.

logan-markewich commented 4 months ago

It's not accessing the internet? It's either using internal knowledge or hallucinating.

You can try adding an additional system prompt with more instructions for how the llm should act

as_chat_engine(..., system_prompt="...")

SiddharthMurjani commented 4 months ago

Tried this, doesn't seem to work

DEFAULT_TEXT_QA_PROMPT_TMPL = (
    "Context information is below.\n"
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "You are a code chatbot\n"
    "Given the context information and not prior knowledge, "
    "answer the query.\n"
    "If the query is generic, do not provide answer \n"
    "with your knowledge, just say, I cannot provide answer.\n"
    "If no context is retrieved, do not synthesis any answer "
    "with previous history or context.\n"
    "Straight up say, 'I am not sure.'\n"
    "Query: {query_str}\n"
    "Answer: "
)

system_prompt = """
You are a helpful code chat assistant. 
Your responses should be based solely on the retrieved context provided to you. 
If no relevant context is retrieved or if the retrieved context does not contain the necessary information to answer the question, respond with "I'm not sure" or "I don't have enough information to answer that question."
Do not synthesize or generate answers based on general knowledge or information from the internet.
Stick strictly to the information provided in the retrieved context.
"""

chat_engine = vector_index.as_chat_engine(
        chat_mode=ChatMode.CONTEXT,
        memory=memory,
        chat_store=chat_store,
        node_postprocessors=[SimilarityPostprocessor(similarity_cutoff=0.81)],
        text_qa_template=PromptTemplate(DEFAULT_TEXT_QA_PROMPT_TMPL),
        system_prompt=system_prompt,
        verbose=True,
        response_mode="no_text"
)
"""

Can you please help?

one example

Question: "Where is India?"

bot answer: "India is is a country in South Asia. It is the seventh-largest country by area; the most populous country as of June 2023; and from the time of its independence in 1947, the world's most populous democracy."

logan-markewich commented 3 months ago

Seems like you just need to iterate with the prompt some more 🤷🏻 Different models are better or worse at following prompts

logan-markewich commented 3 months ago

Also, the text_qa_template is not used in a context chat engine

logan-markewich commented 3 months ago

It has a context_template, or optionally, if the system_prompt is provided, its appended to the context_template

Here's the default context template

DEFAULT_CONTEXT_TEMPLATE = (
    "Context information is below."
    "\n--------------------\n"
    "{context_str}"
    "\n--------------------\n"
)

And your system prompt would get appended to the end of that

AndyThurgood commented 3 months ago

@logan-markewich We are also seeing the issue described above when using the chat engine.

When we ask an initial question we get the correct answer that includes detail pulled from our index context. If we then ask a follow up question we get a response that is not being bound to the context of the chat engine. If we ask a subsequent question that is not related to the previous, we again get the correct response.

As simple setup that we see this issue with:

# list of `ChatMessage` objects
custom_chat_history = [
    ChatMessage(
        role=MessageRole.USER,
        content="How many orders are over 50?",
    ),
    ChatMessage(
        role=MessageRole.ASSISTANT, 
        content="You have 1,842,541 orders over £50",
    )
]

# setup the chat instance
chat_engine = index.as_chat_engine(
    chat_mode="context",
    system_prompt=db_query_template,
    llm=llm
)

# ask follow up question
answer = chat_engine.chat("and also under 2000?", chat_history=custom_chat_history)

This produces an answer that has the correct structure but is referencing a property that doesn't exist anywhere in the chat history or the index context

I've tried tweaking the context_template but it doesn't seem to influence the behavior of follow up questions. Is this an issue with how the system_prompt is handled? We have a {context_str} in the system prompt, is that going to be populated given the way the context_prompt works?

Or is this linked to us manually managing the chat history?