Open irash03 opened 4 months ago
@irash03 You could do like below to form a whole chunk from texts of children belonging to each section like you said in the point 2.
pdf_loader = SmartPDFLoader(llmsherpa_api_url=llmsherpa_api_url)
doc = pdf_loader.pdf_reader.read_pdf(pdf_url)
for section in doc.sections():
chunk = f"{section.to_context_text()}\n\n"
for child in section.children:
chunk += child.to_context_text(include_section_info=False)
print(chunk)
Hi,
I'm trying to parse a document which has a lot of points which in turn has sub points. Goal is to split the text point-wise and parse them as llama-index nodes. For Example, I would like to have this as a single node:
However, when I parse and iterate through chunks (doc.chunks()), the heirarchy for points and subpoints aren't getting assigned.
All these chunks are independent and have no relationship with each other other than with the section heading:
Based on my understanding, we can probably try the following:
Kindly let me know if there's any alternatives for this.
Thanks!