run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
36.48k stars 5.21k forks source link

[Help] Providing the full/better scope to the LLM #1253

Closed tr7zw closed 1 year ago

tr7zw commented 1 year ago

Hey there. I used a slightly modified version of https://github.com/logan-markewich/llama_index_starter_pack for the testing. After loading in https://github.com/tr7zw/ItemSwapper/blob/1.19.4/docs/2.%20Palettes.md (and the other markdown files around it), asking for "How does Palette ignoreItems work?" results in "Palette ignoreItems is not provided in the given context information. Can you please provide more context or details about your question?". Same issue when asking for an example JSON. So far only had success, in it correctly answering questions from the small tldr at the top(For example rather lists can link to palettes: https://github.com/tr7zw/ItemSwapper/blob/1.19.4/docs/6.%20Links.md ). The React-Flask Demo shows the "Response Sources", which so far has been the correct document, so it has to be an issue with the {context_str}. How can I improve the parts sent to the LLM from the picked documents? For sanity checking, I also put the full prompt with the file content into gpt 3.5, and it had no issue answering it.

My modifications to llama_index_starter_pack to try to get better results(and switching to gpt-3.5-turbo):

def initialize_index():
    """Create a new global index, or load one from the pre-set path."""
    global index, stored_docs

    # Define LLM
    llm_predictor = LLMPredictor(llm=ChatOpenAI(temperature=0.2, model_name="gpt-3.5-turbo"))

    # Define prompt helper
    max_input_size = 2048
    num_output = 1024
    max_chunk_overlap = 20
    prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

    service_context = ServiceContext.from_defaults(chunk_size_limit=512, llm_predictor=llm_predictor, prompt_helper=prompt_helper)
    with lock:
        if os.path.exists(index_name):
            index = GPTSimpleVectorIndex.load_from_disk(index_name, service_context=service_context)
        else:
            index = GPTSimpleVectorIndex([], service_context=service_context)
            index.save_to_disk(index_name)
        if os.path.exists(pkl_name):
            with open(pkl_name, "rb") as f:
                stored_docs = pickle.load(f)

def query_index(query_text):
    """Query the global index."""
    global index
    QA_PROMPT_TMPL = (
    "We have provided context information below. \n"
    "---------------------\n"
    "{context_str}"
    "\n---------------------\n"
    "Given this information, please answer the following question. If the question can't be confidentially answered using the information, just write 'Not provided': {query_str}\n"
    )
    QA_PROMPT = QuestionAnswerPrompt(QA_PROMPT_TMPL)
    response = index.query(query_text, text_qa_template=QA_PROMPT)
    return response
Flask==2.2.3
Flask-Cors==3.0.10
langchain==0.0.128
llama-index==0.5.16
PyPDF2==3.0.1
logan-markewich commented 1 year ago

Couple of notes that might help. I would suggest increasing the max_input_size back to the default of 4096, decreasing num_output back to 256 (if you increase num_output, you also need to increase max_tokens in the ChatOpenAI constructor) and increase the chunk size to 1024.

Also, you'll want to set the chunk_size_limit in both the service context and prompt helper (there are two places it is used, in the prompt helper at query time and in the node_parser when the documents are inserted)

You might also want to define a refine template (lately, gpt-3.5 has been bad at refining, I think OpenAI changed the model slightly and it seems much worse now).

I've been testing this refine template, give it a shot and let me know how it goes.

from langchain.prompts.chat import (
    AIMessagePromptTemplate,
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
)

from llama_index.prompts.prompts import RefinePrompt

# Refine Prompt
CHAT_REFINE_PROMPT_TMPL_MSGS = [
    HumanMessagePromptTemplate.from_template("{query_str}"),
    AIMessagePromptTemplate.from_template("{existing_answer}"),
    HumanMessagePromptTemplate.from_template(
        "I have more context below which can be used "
        "(only if needed) to update your previous answer.\n"
        "------------\n"
        "{context_msg}\n"
        "------------\n"
        "Given the new context, update the previous answer to better "
        "answer my previous query."
        "If the previous answer remains the same, repeat it verbatim. "
        "Never reference the new context or my previous query directly.",
    ),
]

CHAT_REFINE_PROMPT_LC = ChatPromptTemplate.from_messages(CHAT_REFINE_PROMPT_TMPL_MSGS)
CHAT_REFINE_PROMPT = RefinePrompt.from_langchain_prompt(CHAT_REFINE_PROMPT_LC)
...
index.query("my query", similarity_top_k=3, refine_template=CHAT_REFINE_PROMPT)
tr7zw commented 1 year ago

Thanks, will take a look later, currently fighting with my new gpu knocking my old psu out 😅

tr7zw commented 1 year ago

Ok, got around to testing it.

Couple of notes that might help. I would suggest increasing the max_input_size back to the default of 4096, decreasing num_output back to 256 (if you increase num_output, you also need to increase max_tokens in the ChatOpenAI constructor) and increase the chunk size to 1024.

Ok, did that. Mainly was playing around with that to see if the output size is maybe not big enough, and that causes it to go 🤷‍♂️.

I did try adding the refine step, but that only caused gpt to dream stuff up. The base issue seems to be, that {context_msg} doesn't contain the relevant information. So probably need some configuring during the inserting step, but I can't find much info about that. https://github.com/logan-markewich/llama_index_starter_pack/blob/main/flask_react/index_server.py#L53 Currently it has nothing set-up. Also should add that I'm a Java developer, so can't really debug it to try to figure out what it stored in the end.

dosubot[bot] commented 1 year ago

Hi, @tr7zw! I'm here to help the LlamaIndex team manage their backlog and I wanted to let you know that we are marking this issue as stale.

Based on my understanding, the issue you reported is related to using the LLM with a modified version of the llama_index_starter_pack. It seems that you encountered an error message when asking certain questions. Another user named logan-markewich suggested some changes to the configuration and provided a refine template for you to try. However, even after trying these suggestions, you mentioned that you are still experiencing issues and mentioned the need for configuring the inserting step.

Before we proceed, we would like to confirm if this issue is still relevant to the latest version of the LlamaIndex repository. If it is, please let us know by commenting on this issue. Otherwise, feel free to close the issue yourself or it will be automatically closed in 7 days.

Thank you for your understanding and cooperation. We look forward to hearing from you soon.