I have built a RAG pipeline with metadata extraction. I parsed 3 pdfs and got 3 Metadatas, one for each.
from llama_parse import LlamaParse
parser = LlamaParse(result_type="text")
docs = parser.load_data(file_path=full_files)
# attach metadata
for metadata, doc in zip(metadatas, docs):
doc.metadata.update(metadata)
Using the above code, I obtained five docs since the documents are divided page-wise, and two PDFs contain two pages each. The first three documents have metadata, while the last two do not.
I have built a RAG pipeline with metadata extraction. I parsed 3 pdfs and got 3 Metadatas, one for each.
Using the above code, I obtained five docs since the documents are divided page-wise, and two PDFs contain two pages each. The first three documents have metadata, while the last two do not.