Closed AvisP closed 3 months ago
🤖
Hello @AvisP,
Good to see you again! I'm reviewing your issue and will provide a detailed answer soon.
🤖
Hello @AvisP,
Nice to see you again! I hope you've been doing well.
Based on the information you've provided and the similar issues I found in the LlamaIndex repository, there are a few things you could try to resolve the issue of getting an empty response:
When loading the index, pass the service context back in: new_index = load_index_from_storage(storage_context, service_context=service_context)
. This was suggested in the issue [Question]: Empty Response after query_engine.query.
Ensure that the correct kwargs are used when creating the Document: from llama_index import Document
and docs=[Document(text=doc.page_content, metadata={"source":doc.metadata["source"]}) for doc in documents]
. Llama-index docs use the text=
kwarg, not page_context=
. This was also suggested in the same issue mentioned above.
Replace the web loader from Langchain to BeautifulSoupWebReader in llama_index. This was another suggestion from the same issue.
Print response.source_nodes
to check if the retrieved nodes are showing properly. This can help you debug the issue.
Check the setup of the LLM. If using llama2, try decreasing the chunk size to 1024 and setting context_window=3800
. This was another suggestion from the same issue.
If none of these suggestions work, it might be due to the pypdf
library not being able to extract text from some PDF files. This can happen if the PDF is made from images or if the text is encoded in a way that pypdf
can't understand. You might want to try using a different library to extract text from PDFs, such as PDFMiner
or PyMuPDF (fitz)
, or use an OCR library like pytesseract
if the PDF is made from images. This was suggested in the answer to the question: Why am I getting an empty response when trying to retrieve an answer from PDFs using the LlamaIndex framework?
I hope this helps! Let me know if you have any other questions.
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
@AvisP usually with llamacpp, empty response means at some point the input to the LLM was too big.
If our are on v0.9 or newer, try setting the global tokenizer to match your LLM. This will improve our token counting when prompting the llm
from llama_index import set_global_tokenizer
# huggingface
from transformers import AutoTokenizer
set_global_tokenizer(
AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1").encode
)
@logan-markewich Thanks for your response and suggestion. I am using version 0.9.2
of llama_index. I added the global_tokenizer setting but the response was still empty. Also I added a token counter and it seems to be well within limits.
import logging
import sys
import os
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index.callbacks import CallbackManager, TokenCountingHandler
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext, StorageContext, load_index_from_storage
import torch
from llama_index import set_global_tokenizer
# huggingface
from transformers import AutoTokenizer
set_global_tokenizer(
AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1").encode
)
from llama_index.llms import LlamaCPP
from llama_index.llms.llama_utils import messages_to_prompt, completion_to_prompt
llm = LlamaCPP(
# You can pass in the URL to a GGML model to download it automatically
# model_url='https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_M.gguf',
# optionally, you can set the path to a pre-downloaded model instead of model_url
# model_path="/Users/paula/Projects/Text Gen/Llama2/models/Llama-2-7B-GGUF/llama-2-7b.Q8_0.gguf",
model_path = "/Users/paula/Projects/Text Gen/Llama2/models/Mistral-7B-GGUF/mistral-7b-v0.1.Q8_0.gguf",
temperature=0.1,
max_new_tokens=256,
# llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
context_window=3900,
# kwargs to pass to __call__()
generate_kwargs={},
# kwargs to pass to __init__()
# set to at least 1 to use GPU
model_kwargs={"n_gpu_layers": 2},
# transform inputs into Llama2 format
messages_to_prompt=messages_to_prompt,
completion_to_prompt=completion_to_prompt,
verbose=True,
)
from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.embeddings import LangchainEmbedding
embed_model = LangchainEmbedding(
HuggingFaceEmbeddings(model_name="thenlper/gte-large")
)
token_counter = TokenCountingHandler(
tokenizer=AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1").encode
)
callback_manager = CallbackManager([token_counter])
service_context = ServiceContext.from_defaults(
chunk_size=256,
callback_manager=callback_manager,
llm=llm,
# embed_model="local"
embed_model=embed_model
)
# check if storage already exists
if not os.path.exists("./storage"):
# load the documents and create the index
documents = SimpleDirectoryReader("./PDFData/").load_data()
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
# store it for later
index.storage_context.persist()
else:
# load the existing index
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context, service_context=service_context)
token_counter.reset_counts()
query_engine = index.as_query_engine()
response = query_engine.query("What is Fibromyalgia?")
print(response)
print(
"Embedding Tokens: ",
token_counter.total_embedding_token_count,
"\n",
"LLM Prompt Tokens: ",
token_counter.prompt_llm_token_count,
"\n",
"LLM Completion Tokens: ",
token_counter.completion_llm_token_count,
"\n",
"Total LLM Token Count: ",
token_counter.total_llm_token_count,
"\n",
)
Output :
Empty Response
Embedding Tokens: 10
LLM Prompt Tokens: 595
LLM Completion Tokens: 1
Total LLM Token Count: 596
@AvisP Hey, were you able to Make Progress on this Issue ?
@Deepansharora27 I would recommend starting with the example provided in Mistral 7B HF page and build up from that. Llama_index and Langchain do not have complete examples and you will have to put in a lot of time to make their functions work. Another option is using Ollama, it is easy to setup and use. Hope that helps
Hi, @AvisP,
I'm helping the LlamaIndex team manage our backlog and am marking this issue as stale. From what I understand, you are experiencing an empty response when querying a PDF using llama_index. There have been suggestions provided by dosubot, logan-markewich, and you, including setting the global tokenizer to match the LLM, checking the token counter, and exploring alternative libraries such as Ollama.
Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!
Question Validation
Question
I am getting an empty response with the following example developed based on sample demo code provided by llama_index documentation. I am getting an empty response to a simple question but the nodes have value in them. Any advise would be helpful.
PDF file can be found here