Closed Data-drone closed 1 year ago
🤖
Hello,
Based on the information available in the LlamaIndex repository, there isn't a PDFLoader
class or module. The repository does contain a PDFReader
class, which is used to parse PDF files. This class doesn't have any explicit default settings. However, it does have a method load_data
which is used to load and parse the PDF data. This method requires a file
parameter, which is a Path
object pointing to the PDF file to be read, and an optional extra_info
parameter, which is a dictionary that can contain additional metadata.
Here is the relevant code snippet from the PDFReader
class:
def load_data(
self, file: Path, extra_info: Optional[Dict] = None
) -> List[Document]:
"""Parse file."""
try:
import pypdf
except ImportError:
raise ImportError(
"pypdf is required to read PDF files: `pip install pypdf`"
)
with open(file, "rb") as fp:
# Create a PDF object
pdf = pypdf.PdfReader(fp)
# Get the number of pages in the PDF document
num_pages = len(pdf.pages)
# Iterate over every page
docs = []
for page in range(num_pages):
# Extract the text from the page
page_text = pdf.pages[page].extract_text()
page_label = pdf.page_labels[page]
metadata = {"page_label": page_label, "file_name": file.name}
if extra_info is not None:
metadata.update(extra_info)
docs.append(Document(text=page_text, metadata=metadata))
return docs
This code is located in the llama_index/readers/file/docs_reader.py
file in the repository.
If you were referring to a different PDFLoader
class or module, could you please provide more context or details? This will help me provide a more accurate answer.
Best regards, Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
If you want to continue the conversation, start your reply with @dosu-bot.
The bot is correct. If you want more details, click the link to the source file
Question Validation
Question
How do I find the default settings for
PDFLoader
?