Closed vikrantdeshpande09876 closed 4 weeks ago
A similar issue was reported where the SimpleDirectoryReader
had problems with PDF reading, and it was resolved by updating to version 0.10.3 of the library [1]. However, since you're using a later version, the problem might be different.
The error message indicates that the PyMuPDFReader.load_data()
method is missing the file_path
argument. Ensure that the file_extractor
is correctly instantiated with PyMuPDFReader()
instead of just PyMuPDFReader
. Here's how you can modify your code:
from llama_index.core import SimpleDirectoryReader
from llama_index.readers.file import PyMuPDFReader
reader_config = SimpleDirectoryReader(
input_dir="../data/raw/rag-data",
recursive=True,
file_extractor={
'.pdf': PyMuPDFReader() # Instantiate the reader
}
)
documents = reader_config.load_data()
This should resolve the issue by correctly passing the file_path
to the load_data()
method.
To continue talking to Dosu, mention @dosu.
Yea pretty sure you need to instantiate the reader like dosu suggested
Man 🤦. Okay closing this out.
Bug Description
Trying to execute:
Version
llama-index==0.11.19 llama-index-agent-openai==0.3.4 llama-index-cli==0.3.1 llama-index-core==0.11.19 llama-index-embeddings-huggingface==0.3.1 llama-index-embeddings-openai==0.2.5 llama-index-indices-managed-llama-cloud==0.4.0 llama-index-legacy==0.9.48.post3 llama-index-llms-langchain==0.4.2 llama-index-llms-openai==0.2.16 llama-index-llms-openai-like==0.2.0 llama-index-llms-openllm==0.3.1 llama-index-multi-modal-llms-openai==0.2.3 llama-index-program-openai==0.2.0 llama-index-question-gen-openai==0.2.0 llama-index-readers-file==0.2.2 llama-index-readers-llama-parse==0.3.0 llama-index-vector-stores-postgres==0.2.6
Steps to Reproduce
Shouldn't this line be
file: Union[Path, str],
instead? Am I missing something obvious here, or is there some version inconsistency?Relevant Logs/Tracbacks