run-llama / llama_index

LlamaIndex is a data framework for your LLM applications
https://docs.llamaindex.ai
MIT License
33.78k stars 4.75k forks source link

[Question]: Getting empty response while retrieving answer from pdfs #8973

Closed AvisP closed 3 months ago

AvisP commented 8 months ago

Question Validation

Question

I am getting an empty response with the following example developed based on sample demo code provided by llama_index documentation. I am getting an empty response to a simple question but the nodes have value in them. Any advise would be helpful.

import logging
import sys
import os
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))

from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext, StorageContext, load_index_from_storage

import torch

from llama_index.llms import LlamaCPP
from llama_index.llms.llama_utils import messages_to_prompt, completion_to_prompt
llm = LlamaCPP(
    # You can pass in the URL to a GGML model to download it automatically
    model_url='https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q8_0.gguf',
    # model_path = "/Users/paula/Projects/Text Gen/Llama2/models/Mistral-7B-GGUF/mistral-7b-v0.1.Q8_0.gguf",
    temperature=0.1,
    max_new_tokens=256,
    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
    context_window=3900,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    model_kwargs={"n_gpu_layers": 2},
    # transform inputs into Llama2 format
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)

from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.embeddings import LangchainEmbedding
# from llama_index.service_context import ServiceContext

embed_model = LangchainEmbedding(
  HuggingFaceEmbeddings(model_name="thenlper/gte-large")
)

service_context = ServiceContext.from_defaults(
    chunk_size=256,
    llm=llm,
    embed_model=embed_model
)

# check if storage already exists
if not os.path.exists("./storage"):
    # load the documents and create the index
    documents = SimpleDirectoryReader("./PDFData/").load_data()
    index = VectorStoreIndex.from_documents(documents, service_context=service_context)
    # store it for later
    index.storage_context.persist()
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir="./storage")
    index = load_index_from_storage(storage_context, service_context=service_context)

query_engine = index.as_query_engine()
response = query_engine.query("What is Fibromyalgia?")

print(response)

print(response.source_nodes[0].text)
print(response.source_nodes[1].text)

PDF file can be found here

dosubot[bot] commented 8 months ago

🤖

Hello @AvisP,

Good to see you again! I'm reviewing your issue and will provide a detailed answer soon.

dosubot[bot] commented 8 months ago

🤖

Hello @AvisP,

Nice to see you again! I hope you've been doing well.

Based on the information you've provided and the similar issues I found in the LlamaIndex repository, there are a few things you could try to resolve the issue of getting an empty response:

  1. When loading the index, pass the service context back in: new_index = load_index_from_storage(storage_context, service_context=service_context). This was suggested in the issue [Question]: Empty Response after query_engine.query.

  2. Ensure that the correct kwargs are used when creating the Document: from llama_index import Document and docs=[Document(text=doc.page_content, metadata={"source":doc.metadata["source"]}) for doc in documents]. Llama-index docs use the text= kwarg, not page_context=. This was also suggested in the same issue mentioned above.

  3. Replace the web loader from Langchain to BeautifulSoupWebReader in llama_index. This was another suggestion from the same issue.

  4. Print response.source_nodes to check if the retrieved nodes are showing properly. This can help you debug the issue.

  5. Check the setup of the LLM. If using llama2, try decreasing the chunk size to 1024 and setting context_window=3800. This was another suggestion from the same issue.

If none of these suggestions work, it might be due to the pypdf library not being able to extract text from some PDF files. This can happen if the PDF is made from images or if the text is encoded in a way that pypdf can't understand. You might want to try using a different library to extract text from PDFs, such as PDFMiner or PyMuPDF (fitz), or use an OCR library like pytesseract if the PDF is made from images. This was suggested in the answer to the question: Why am I getting an empty response when trying to retrieve an answer from PDFs using the LlamaIndex framework?

I hope this helps! Let me know if you have any other questions.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

logan-markewich commented 8 months ago

@AvisP usually with llamacpp, empty response means at some point the input to the LLM was too big.

If our are on v0.9 or newer, try setting the global tokenizer to match your LLM. This will improve our token counting when prompting the llm

from llama_index import set_global_tokenizer

# huggingface
from transformers import AutoTokenizer

set_global_tokenizer(
    AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1").encode
)
AvisP commented 8 months ago

@logan-markewich Thanks for your response and suggestion. I am using version 0.9.2 of llama_index. I added the global_tokenizer setting but the response was still empty. Also I added a token counter and it seems to be well within limits.

import logging
import sys
import os
logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))
from llama_index.callbacks import CallbackManager, TokenCountingHandler
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext, StorageContext, load_index_from_storage
import torch
from llama_index import set_global_tokenizer

# huggingface
from transformers import AutoTokenizer

set_global_tokenizer(
    AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1").encode
)

from llama_index.llms import LlamaCPP
from llama_index.llms.llama_utils import messages_to_prompt, completion_to_prompt
llm = LlamaCPP(
    # You can pass in the URL to a GGML model to download it automatically
    # model_url='https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF/resolve/main/mistral-7b-instruct-v0.1.Q4_K_M.gguf',
    # optionally, you can set the path to a pre-downloaded model instead of model_url
    # model_path="/Users/paula/Projects/Text Gen/Llama2/models/Llama-2-7B-GGUF/llama-2-7b.Q8_0.gguf",
    model_path = "/Users/paula/Projects/Text Gen/Llama2/models/Mistral-7B-GGUF/mistral-7b-v0.1.Q8_0.gguf",
    temperature=0.1,
    max_new_tokens=256,
    # llama2 has a context window of 4096 tokens, but we set it lower to allow for some wiggle room
    context_window=3900,
    # kwargs to pass to __call__()
    generate_kwargs={},
    # kwargs to pass to __init__()
    # set to at least 1 to use GPU
    model_kwargs={"n_gpu_layers": 2},
    # transform inputs into Llama2 format
    messages_to_prompt=messages_to_prompt,
    completion_to_prompt=completion_to_prompt,
    verbose=True,
)

from langchain.embeddings.huggingface import HuggingFaceEmbeddings
from llama_index.embeddings import LangchainEmbedding

embed_model = LangchainEmbedding(
  HuggingFaceEmbeddings(model_name="thenlper/gte-large")
)

token_counter = TokenCountingHandler(
    tokenizer=AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1").encode
)

callback_manager = CallbackManager([token_counter])

service_context = ServiceContext.from_defaults(
    chunk_size=256,
    callback_manager=callback_manager,
    llm=llm,
    # embed_model="local"
    embed_model=embed_model
)

# check if storage already exists
if not os.path.exists("./storage"):
    # load the documents and create the index
    documents = SimpleDirectoryReader("./PDFData/").load_data()
    index = VectorStoreIndex.from_documents(documents, service_context=service_context)
    # store it for later
    index.storage_context.persist()
else:
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir="./storage")
    index = load_index_from_storage(storage_context, service_context=service_context)

token_counter.reset_counts()

query_engine = index.as_query_engine()
response = query_engine.query("What is Fibromyalgia?")

print(response)

print(
    "Embedding Tokens: ",
    token_counter.total_embedding_token_count,
    "\n",
    "LLM Prompt Tokens: ",
    token_counter.prompt_llm_token_count,
    "\n",
    "LLM Completion Tokens: ",
    token_counter.completion_llm_token_count,
    "\n",
    "Total LLM Token Count: ",
    token_counter.total_llm_token_count,
    "\n",
)

Output :

Empty Response
Embedding Tokens:  10 
 LLM Prompt Tokens:  595 
 LLM Completion Tokens:  1 
 Total LLM Token Count:  596 
Deepansharora27 commented 6 months ago

@AvisP Hey, were you able to Make Progress on this Issue ?

AvisP commented 6 months ago

@Deepansharora27 I would recommend starting with the example provided in Mistral 7B HF page and build up from that. Llama_index and Langchain do not have complete examples and you will have to put in a lot of time to make their functions work. Another option is using Ollama, it is easy to setup and use. Hope that helps

dosubot[bot] commented 3 months ago

Hi, @AvisP,

I'm helping the LlamaIndex team manage our backlog and am marking this issue as stale. From what I understand, you are experiencing an empty response when querying a PDF using llama_index. There have been suggestions provided by dosubot, logan-markewich, and you, including setting the global tokenizer to match the LLM, checking the token counter, and exploring alternative libraries such as Ollama.

Could you please confirm if this issue is still relevant to the latest version of the LlamaIndex repository? If it is, please let the LlamaIndex team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or it will be automatically closed in 7 days. Thank you!