run-llama / llama_extract

MIT License
105 stars 16 forks source link

Failed to attach the metadata dictionaries to each document in RAG. #18

Open SabaAnjum2002 opened 3 months ago

SabaAnjum2002 commented 3 months ago

I have built a RAG pipeline with metadata extraction. I parsed 3 pdfs and got 3 Metadatas, one for each.

from llama_parse import LlamaParse

parser = LlamaParse(result_type="text")
docs = parser.load_data(file_path=full_files)
# attach metadata
for metadata, doc in zip(metadatas, docs):
    doc.metadata.update(metadata)

Using the above code, I obtained five docs since the documents are divided page-wise, and two PDFs contain two pages each. The first three documents have metadata, while the last two do not.