Closed Ma-Fukudama closed 1 year ago
🤖
Hello,
From your code, it seems like you've correctly created an index using the ListIndex.from_documents
function. This function is used to create a new instance of the ListIndex
class from a list of documents. The ListIndex
object created by this function contains an index structure that holds the nodes representing the documents, and a document store that can be used to retrieve the nodes based on their IDs.
However, creating an index is just the first step. After loading the data into the index, you need to use the read
function to query the index. The read
function is used to query the index. It checks if the index exists. If the index does not exist, it returns an error message. If the index exists, it queries the index and returns the response.
Here is an example of how you can use the read
function to query the index:
query = "Your query here"
response = index.read(query)
print(response)
Please replace "Your query here" with the actual query you want to make.
Also, make sure that the data you're trying to index is in the correct format. The ListIndex.from_documents
function expects a list of Document
objects, where each Document
object represents a document to be indexed. In your code, you're creating Document
objects from a list of strings (text_list
), which might not be the correct format if your documents are more complex.
If you're still having issues, could you please provide more information about the data you're trying to index and the queries you're making? This will help me better understand your issue and provide a more accurate solution.
I hope this helps! Let me know if you have any other questions.
Best, Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot The source that calls the index is as follows
storage_context = StorageContext.from_defaults(persist_dir="/index")
index = load_index_from_storage(storage_context)
engine = index.as_query_engine(text_qa_template=QuestionAnswerPrompt(self.QA_PROMPT_TMPL), retriever_mode=ListRetrieverMode.EMBEDDING, similarity_top_k=1)
result = engine.query(question+"回答には出典のリンクを含めるようにしてください。")
🤖
Based on the information you've provided and the context from the LlamaIndex repository, there could be a few reasons why the responses from your LlamaIndex are not reflecting the content of the indexed documents.
Data Loading: Ensure that the data is being loaded correctly into the index. This could be due to an error in the data loading function or the data itself. You can check this by ensuring that the _insert
method in the DocumentSummaryIndex
class is being called correctly and that it is successfully inserting the documents into the index.
Query Matching: The query might not be matching the indexed documents. This could be due to the format of the query or the format of the indexed documents. You can check this by ensuring that the as_retriever
method in the DocumentSummaryIndex
class is being called with the correct query and that the query is in a format that matches the indexed documents.
Regarding the format of the documents to be indexed, they should be a sequence of BaseNode
objects. These documents can be processed by the _add_nodes_to_index
method of the DocumentSummaryIndex
class. The documents should also be in a format that can be queried by the as_retriever
method of the DocumentSummaryIndex
class.
As for the ListIndex.from_documents
method, I couldn't find specific information about it in the repository. However, typically, such a method would be used to create an index from a list of documents. It would iterate over the documents, processing each one and adding it to the index.
Lastly, as per the solved issue in the repository, you might want to consider using more custom prompts and defining both a text QA template and a refine template. This could help in getting more accurate responses from the LLM. You can find more information on custom prompts here.
I hope this helps! If you have any more questions or need further clarification, feel free to ask.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@dosu-bot The source is below
indexing part
import os, sys, json,logging
from dotenv import load_dotenv
from llama_index import SimpleDirectoryReader, ServiceContext, ListIndex, LLMPredictor, Document, load_index_from_storage, SimpleWebPageReader
from llama_index.llms import OpenAI
from llama_index.langchain_helpers.agents import create_llama_chat_agent, IndexToolConfig
load_dotenv()
# ログレベルの設定
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG, force=True)
text_list = ['paul_graham_essay.txt']
index = ListIndex([])
# Llamaindexの設定
llm = OpenAI(model="gpt-4", temperature=0, max_tokens=2048)
service_context = ServiceContext.from_defaults(llm=llm)
# 指定したURLのブログを読み取る。
url_documents = SimpleWebPageReader(html_to_text=True).load_data(
[
"http://paulgraham.com/worked.html"
]
)
# インデックス作成
index = ListIndex.from_documents(url_documents, service_context=service_context)
index.storage_context.persist(persist_dir="index")
Screen display using streamlit
import os, sys, json, site, time, logging
from dotenv import load_dotenv
import streamlit as st
from streamlit_chat import message
import tiktoken
from llama_index import (
download_loader,
LLMPredictor,
VectorStoreIndex,
ServiceContext,
QuestionAnswerPrompt,
StorageContext,
load_index_from_storage,
SimpleDirectoryReader,
ListIndex
)
from langchain import OpenAI
from langchain.chat_models import ChatOpenAI
from llama_index.indices.list.base import ListRetrieverMode
load_dotenv()
# ログレベルの設定
logging.basicConfig(stream=sys.stdout, level=logging.DEBUG, force=True)
class QAResponseGenerator:
def __init__(self, selected_model):
self.llm_predictor = LLMPredictor(llm=OpenAI(temperature=1, model_name="gpt-3.5-turbo-0613"))
self.QA_PROMPT_TMPL = (
"下記の情報が与えられています。 \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"この情報を参照して次の質問に答えてください: {query_str}\n"
)
self.service_context = ServiceContext.from_defaults(llm_predictor=self.llm_predictor)
def generate(self, question):
start = time.time()
storage_context = StorageContext.from_defaults(persist_dir="/home/heart/llama_index/examples/paul_graham_essay/index")
index = load_index_from_storage(storage_context)
elapsed_time = time.time() - start
print("load_elapsed_time:{0}".format(elapsed_time) + "[sec]")
start = time.time()
engine = index.as_query_engine(text_qa_template=QuestionAnswerPrompt(self.QA_PROMPT_TMPL), retriever_mode=ListRetrieverMode.EMBEDDING, similarity_top_k=3)
result = engine.query(question+"回答には出典のリンクを含めるようにしてください。")
elapsed_time = time.time() - start
print("query_time:{0}".format(elapsed_time) + "[sec]")
return result.response.replace("\n", ""), result.get_formatted_sources(1000)
def display_chat(chat_history):
for i, chat in enumerate(reversed(chat_history)):
if "user" in chat:
message(chat["user"], is_user=True, key=str(i))
else:
message(chat["bot"], key="bot_"+str(i))
def main():
st.title('chatGPT')
question = st.text_input("Your question")
# メインの画面に質問送信ボタンを設定
submit_question = st.button("質問")
clear_chat = st.button("履歴消去")
# チャット履歴を保存
if "chat_history" not in st.session_state:
st.session_state["chat_history"] = []
if clear_chat:
st.session_state["chat_history"] = []
# pdf_reader = PDFReader()
response_generator = QAResponseGenerator("gpt-3.5-turbo-0613")
# ボタンがクリックされた場合の処理
if submit_question:
if question: # 質問が入力されている場合
response = response_generator.generate(question)
# if choice == "表示する":
# response += f"\n\n参照した情報は次の通りです:\n{source}"
# 質問と応答をチャット履歴に追加
st.session_state["chat_history"].append({"user": question})
st.session_state["chat_history"].append({"bot": response})
display_chat(st.session_state["chat_history"])
if __name__ == "__main__":
main()
🤖
Hello,
Thank you for reaching out and providing detailed information about your issue. Based on the context you've provided, there could be several reasons why the responses from your LlamaIndex are not reflecting the content of the indexed documents. Here are some possibilities:
Data Loading: The function you're using to load the data might not be correctly extracting the content from the webpage. Please ensure that the SimpleWebPageReader
is correctly parsing the webpage and extracting the necessary information.
Query Matching: The query
method of your index_cls
might not be implemented correctly, leading to unexpected results. Please check the implementation of the query
method in your index_cls
.
Document Indexing: The insert
method of your index_cls
might not be correctly indexing the documents. Please check the implementation of the insert
method in your index_cls
.
Index Creation: The from_documents
method of your index_cls
might not be correctly creating the index. Please check the implementation of the from_documents
method in your index_cls
.
PineconeVectorStore: If the add
method in the PineconeVectorStore
class is not correctly adding the documents to the Pinecone index, it might not return the expected results when queried. Please check the implementation of the add
method in the PineconeVectorStore
class.
Streaming Response: If you have enabled streaming, it's possible that the responses you're seeing are not complete. Please check your handling of the streaming response.
Without more specific information about the behavior you're experiencing and the specific implementations of your index_cls
and data loading function, it's difficult to provide a more specific answer.
As for the ListIndex.from_documents
method, I'm sorry but I couldn't find specific information about its exact functionality in the LlamaIndex codebase. It would be helpful if you could provide more details about the issues you're experiencing with this method.
Lastly, I found a similar issue in the LlamaIndex repository that might be helpful: What is returned when the retrieved documents don't have information for the given question?. The suggested solution is to use more custom prompts and to define both a text QA template and a refine template. You can find more information on custom prompts at https://gpt-index.readthedocs.io/en/latest/how_to/customization/custom_prompts.html.
I hope this helps! If you have any more questions or need further clarification, please don't hesitate to ask.
Best, Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@Ma-Fukudama Hmm, I think the way you built the index is not correct.
Right now, it seems that you are just inserting the names of files, not the actual content of files?
Here's a slight modified version
from llama_index import SimpleDirectoryReader
text_list = ['A.txt']
documents = SimpleDirectoryReader(input_files=text_list).load_data()
index = ListIndex.from_documents(documents, service_context=service_context)
@logan-markewich The part about creating the document was missing.
text_list = ['A.txt']
file_documents = [Document(text=t) for t in text_list]
The above source can display the file name, but it cannot be used as an index. I can't get the file name of the quoted source with the content you taught me.
@Ma-Fukudama if you want both the filename and document text, you can set the filename_fn in SimpleDirectoryReader, you can configure it to set the Metadata to include the filename, and simple directory reader will take care of the text
from llama_index import SimpleDirectoryReader
filename_fn = lambda filename: {'file_name': filename}
# automatically sets the metadata of each document according to filename_fn
documents = SimpleDirectoryReader(input_files=text_list, file_metadata=filename_fn).load_data()
print(documents[0].text)
print(documents[0].metadata)
Or, maybe I don't understand your goal 😅
@logan-markewich I was able to accomplish what I wanted to do by using what you taught me. thank you. Other issues were also resolved by changing from ListIndex to VectorStoreIndex.
Nice!
Question Validation
Question
I created an index from a file with the following source code, but even when I ask questions, the content is not reflected in the responses. Is there any additional step required when creating an index?
llamaindex version = 0.8.13