Open pathucr7 opened 2 months ago
To address the issues you're encountering with your RAG chatbot, here are targeted solutions:
For the short answer length, explore the max_length
, min_length
, and length_penalty
parameters of the HuggingFaceInferenceAPI
. Adjusting these can help control the verbosity of the model's responses. More details on these parameters can be found in the HuggingFace documentation.
To improve follow-up question accuracy, refine the condensing prompt or employ semantic search techniques. This involves enhancing the chatbot's ability to understand and generate relevant follow-up questions by leveraging the context more effectively.
To prevent model overload, consider implementing caching for frequent queries and using a load balancer or queue system to manage request loads. Additionally, optimizing the retrieval process by limiting the context considered for each query can help manage the model's workload.
Enhancing contextual understanding with HuggingFace Embeddings could significantly improve the chatbot's performance. By utilizing sentence or document embeddings, the chatbot can achieve a better semantic understanding of both the queries and the e-book content, leading to more accurate responses.
Regarding the issue of the chatbot spamming queries as if in a loop, it's crucial to review the CondenseQuestionChatEngine
logic to ensure there's no repetitive loop in response generation.
These steps should help you refine your chatbot's performance and address the specific challenges you've outlined.
Question Validation
Question
I am trying to create a RAG chatbot that only answers from the content that i provide through pdf and never uses its own knowledge to answer or add information to the answer that is outside of the context. At this moment i am facing three issues:- 1) the answer length is quite small 2) the follow up question that it creates is totally of the mark 3) model overload issues
As i am creating a RAG chatbot for e-book summarization the answers are meant to be lengthy. I have used HuggingFace for embeddings as well as for the LLM. There was very less docs to refer to when using CondenseQuestionChatEngine so add to learn for myself. I thought i had solved it then i realize it was running on default llm which is openai, and after solving that issue all the prompts that i created for the Condensequestion started to fail, now if i run the question in query engine and print it i get the right answers. but the chat engine when trying to make sense using the chat history it completely goes of the grid and ask the wrong question all together and way it is setup the bot will revert with i don't know. here is my code for reference:- ` import streamlit as st from llama_index.core import Settings import logging import sys import os.path
logging.basicConfig(stream=sys.stdout, level=logging.INFO) logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout)) from llama_index.core.service_context import set_global_service_context
from llama_index.llms.llama_cpp import LlamaCPP
from langchain.embeddings.huggingface import HuggingFaceEmbeddings from langchain.embeddings.huggingface import HuggingFaceBgeEmbeddings from langchain_community.llms.huggingface_endpoint import HuggingFaceEndpoint from llama_cpp import Llama
from langchain.llms import HuggingFaceHub
from llama_index.core.prompts.chat_prompts import ChatPromptTemplate, ChatMessage,MessageRole from llama_index.core.chat_engine import CondenseQuestionChatEngine, ContextChatEngine
from llama_index.legacy.prompts import ChatPromptTemplate
from llama_index.core.base.llms.types import ChatMessage, MessageRole
try: from llama_index import VectorStoreIndex, ServiceContext, Document, SimpleDirectoryReader, StorageContext, load_index_from_storage except ImportError: from llama_index.core import VectorStoreIndex, ServiceContext, Document, SimpleDirectoryReader, StorageContext, load_index_from_storage
from llama_index.llms.huggingface import ( HuggingFaceInferenceAPI, HuggingFaceLLM, ) from llama_index.llms.llama_cpp.llama_utils import ( messages_to_prompt, completion_to_prompt, ) from huggingface_hub import login
login("MY_HUGGINGFACE_API_KEY")
from transformers import AutoTokenizer
st.set_page_config(page_title="Chat with the Streamlit docs, powered by LlamaIndex", page_icon="π¦", layout="centered", initial_sidebar_state="auto", menu_items=None)
st.title("Chat with the Streamlit docs, powered by LlamaIndex π¬π¦") st.info("Check out the full tutorial to build this app in our blog post", icon="π")
from llama_index.core import PromptTemplate
prompt_template = """### System: Use the following pieces of information to answer the user's question. If you don't know the answer, just say that you don't know, don't try to make up an answer. Only return the helpful answer below and nothing else. Helpful answer: """
if "messages" not in st.session_state.keys(): # Initialize the chat messages history st.session_state.messages = [ {"role": "assistant", "content": "Ask me a question about the E-books!"} ]
PERSIST_DIR = "./storage"
@st.cache_resource(show_spinner=False) def load_data(): with st.spinner(text="Loading and indexing the E-books β hang tight! This should take 1-2 minutes."):
reader = SimpleDirectoryReader(input_dir="./data", recursive=True)
index = load_data()
def generate_text(prompt):
question = ("tell me a story with a lesson?")
qa_prompt_str = ( "Context information is below.\n" "---------------------\n" "{context_str}\n" "---------------------\n" "Given only the context information and not prior knowledge, " "answer the question: {query_str}\n" )
refine_prompt_str = ( "We have the opportunity to refine the original answer " "(only if needed) with some more context below.\n" "------------\n" "{context_msg}\n" "------------\n" "Given the new context, refine the original answer to better " "answer the question: {query_str}. " "If the context isn't useful, output the original answer again.\n" "Original Answer: {existing_answer}" )
chat_text_qa_msgs = [ ChatMessage( role=MessageRole.SYSTEM, content=( prompt_template ), ), ChatMessage( role=MessageRole.USER, content=( qa_prompt_str ), ), ] text_qa_template = ChatPromptTemplate(chat_text_qa_msgs)
Refine Prompt
chat_refine_msgs = [ ChatMessage( role=MessageRole.SYSTEM, content=( "If the context isn't helpful, just say I don't know. Don't any add informtion into the answer that is not available in the context" ), ), ChatMessage( role=MessageRole.USER, content=( "New Context: {context_msg}\n" "Query: {query_str}\n" "Original Answer: {existing_answer}\n" "New Answer: " ), ), ] refine_template = ChatPromptTemplate(chat_refine_msgs)
custom_prompt = PromptTemplate( """\ Given a conversation (between Human and Assistant) and a follow up message from Human, \ rewrite the message to be a standalone question that captures all relevant context \ from the conversation.