Closed nathanmoura closed 11 months ago
Have the exact same problem. Would love to see how rest of you are dealing with it.
@harishgoli9 @NathanMoura
The issue here is with the refine template. GPT 3.5 was recently "updated" and struggles with this process
I've been working on a new refine prompt. Maybe try it out and let me know if it helps! Trying to see if it helps enough people to make a PR to change the default
from langchain.prompts.chat import (
AIMessagePromptTemplate,
ChatPromptTemplate,
HumanMessagePromptTemplate,
)
from llama_index.prompts.prompts import RefinePrompt
# Refine Prompt
CHAT_REFINE_PROMPT_TMPL_MSGS = [
HumanMessagePromptTemplate.from_template("{query_str}"),
AIMessagePromptTemplate.from_template("{existing_answer}"),
HumanMessagePromptTemplate.from_template(
"I have more context below which can be used "
"(only if needed) to update your previous answer.\n"
"------------\n"
"{context_msg}\n"
"------------\n"
"Given the new context, update the previous answer to better "
"answer my previous query."
"If the previous answer remains the same, repeat it verbatim. "
"Never reference the new context or my previous query directly.",
),
]
CHAT_REFINE_PROMPT_LC = ChatPromptTemplate.from_messages(CHAT_REFINE_PROMPT_TMPL_MSGS)
CHAT_REFINE_PROMPT = RefinePrompt.from_langchain_prompt(CHAT_REFINE_PROMPT_LC)
...
index.query("my query", similarity_top_k=3, refine_template=CHAT_REFINE_PROMPT)
I just switched to gpt-4 model but would love to go back to gpt3.5 turbo when the usage ramps up
We also merged a recent change to the refine prompt for gpt-3.5. Going to close this issue for now, feel free to reach out on discord though!
@logan-markewich still looks like this error is present for me with v0.7.15. It takes 10-15 iterations for gpt-35-turbo for Langchain process for Azure OpenAI to generate an answer. Adding any kind of query preambles/prompt templates increases the number of seconds to wait for the answer up to 300 sec.
I do understand that partly the issue is on Azure OpenAI side, while I can also notice in debugging more that valuable answers could be already received from 1st response. How could it be resolved? Is there any way to force prevention of occurance?
Just a while (2 weeks) ago it was working blazingly fast...
I am currently utilizing the langchain library with
ChatOpenAI(model='gpt-3.5-turbo')
in a Retrieval Augmented Generation (RAG) framework, which is based on thellama-index
'sGPTSimpleVectorIndex
.However, when I execute the query function, the Language Model repeatedly responds with:
"The original answer remains relevant and does not require refinement based on the additional context provided."
Interestingly, when I switch to the
OpenAI(model='text-davinci-003')
model, it functions as expected, successfully retrieving information and providing appropriate answers.It appears that the
gpt-3.5-turbo
chat model might be unable or unwilling to refine answers generated in previous iterations.EDIT: Upon further investigation, by setting the logging module to DEBUG level, I noticed that the chat model does provide accurate answers during previous iterations. The issue occurs only when the final prompt is as follows:
Human: We have the opportunity to refine the above answer (only if needed) with some more context below. [context chunk here] Given the new context, refine the original answer to better answer the question. If the context isn't useful, output the original answer again.
In this case, the model consistently states that no refinement is needed, instead of outputing the original answer again.
Does anyone have a fix for this?