run-llama / llama_parse

Parse files for optimal RAG
https://www.llamaindex.ai
MIT License
1.79k stars 157 forks source link

No metadata for async aload_data runs #219

Open RobertHH-IS opened 3 weeks ago

RobertHH-IS commented 3 weeks ago

When using aload_data the returned documents only carry an id but no metadata regarding filename. Batched returns are therefore impossible to pair.

# Initialize the parser
parser = LlamaParse(
    result_type="markdown", 
)

# Get the list of PDF files in the directory
directory = "./test"
pdf_files = [os.path.join(directory, filename) for filename in os.listdir(directory) if filename.endswith('.pdf')]

# Asynchronous function to load data
async def load_documents(file_list):
    documents = await parser.aload_data(file_list)
    return documents

# Run the asynchronous function
documents = asyncio.run(load_documents(pdf_files))

Returns... [Document(id_='e9a15122-0ac6-4d09-b149-37695cb67aa3', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, text="# U.......