I encountered an error while processing multi-column PDFs using the pymupdf4llm library. The error occurs in the helpers/multi_column.py file, specifically at line 254. Here is the traceback:
Traceback (most recent call last):
File "/share/home/sh/code/llmx/analysis.py", line 133, in analyze_dir
analyze_pdf(os.path.join(root, file), output_dir,config)
File "/share/home/sh/code/llmx/analysis.py", line 42, in analyze_pdf
doc = PDFDocumentProcessor(pdf_path, config=config, extension_name=ename)
File "/share/home/sh/code/llmx/document.py", line 31, in __init__
self.str_preprocess(pymupdf4llm.to_markdown(pdf_path))
File "/share/home/sh/code/llmx/pymupdf4llm/helpers/pymupdf_rag.py", line 544, in to_markdown
page_output, images, tables, graphics = get_page_output(
File "/share/home/sh/code/llmx/pymupdf4llm/helpers/pymupdf_rag.py", line 499, in get_page_output
text_rects = column_boxes(
File "/share/home/sh/code/llmx/pymupdf4llm/helpers/multi_column.py", line 254, in column_boxes
line0 = b["lines"][0] # get first line
IndexError: list index out of range
The problematic line is:
line0 = b["lines"][0] # get first line
This line can throw an IndexError when b["lines"] is an empty list. Could you please modify the code to handle this situation gracefully?
I encountered an error while processing multi-column PDFs using the pymupdf4llm library. The error occurs in the helpers/multi_column.py file, specifically at line 254. Here is the traceback:
The problematic line is:
This line can throw an IndexError when b["lines"] is an empty list. Could you please modify the code to handle this situation gracefully?