Open NguyenDinhTiem opened 6 days ago
When you load a multi-page PDF file using the SimpleDirectoryReader
with load_data()
, a list of documents appears because the SimpleDirectoryReader
processes each page of the PDF as a separate document. This is due to the PDFReader
class, which handles PDF files by creating a Document
object for each page, unless the return_full_document
option is set to True
. By default, this option is False
, so each page is processed individually, resulting in a list of Document
objects, each representing a page of the PDF [1][2].
To continue talking to Dosu, mention @dosu.
PDFs are split by page by default, to help with citations
Question Validation
Question
Why does a list of documents appear when I load a multi-page pdf file?
documents = SimpleDirectoryReader( input_files=FILES_PATH, filename_as_id = True ).load_data()