When using aload_data the returned documents only carry an id but no metadata regarding filename. Batched returns are therefore impossible to pair.
# Initialize the parser
parser = LlamaParse(
result_type="markdown",
)
# Get the list of PDF files in the directory
directory = "./test"
pdf_files = [os.path.join(directory, filename) for filename in os.listdir(directory) if filename.endswith('.pdf')]
# Asynchronous function to load data
async def load_documents(file_list):
documents = await parser.aload_data(file_list)
return documents
# Run the asynchronous function
documents = asyncio.run(load_documents(pdf_files))
When using aload_data the returned documents only carry an id but no metadata regarding filename. Batched returns are therefore impossible to pair.
Returns... [Document(id_='e9a15122-0ac6-4d09-b149-37695cb67aa3', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text="# U.......