johnlinp / pdf-to-markdown

Convert PDF files into markdown files
BSD 3-Clause "New" or "Revised" License
284 stars 70 forks source link

IndexError: list index out of range #15

Closed chinobing closed 6 years ago

chinobing commented 6 years ago
(C:\Program Files\Anaconda3\envs\pdftomd) C:\Users\Administrator\PycharmProjects\pdftomd>python main.py test.pdf
Parsing test.pdf
Traceback (most recent call last):
  File "main.py", line 31, in <module>
    main(sys.argv)
  File "main.py", line 26, in main
    writer.write(piles)
  File "C:\Users\Administrator\PycharmProjects\pdftomd\pdf2md\writer.py", line 27, in write
    self._write_simple(piles)
  File "C:\Users\Administrator\PycharmProjects\pdftomd\pdf2md\writer.py", line 50, in _write_simple
    markdown = pile.gen_markdown(self._syntax)
  File "C:\Users\Administrator\PycharmProjects\pdftomd\pdf2md\pile.py", line 76, in gen_markdown
    return self._gen_table_markdown(syntax)
  File "C:\Users\Administrator\PycharmProjects\pdftomd\pdf2md\pile.py", line 290, in _gen_table_markdown
    intermediate = self._gen_table_intermediate()
  File "C:\Users\Administrator\PycharmProjects\pdftomd\pdf2md\pile.py", line 319, in _gen_table_intermediate
    bottom, rowspan = self._find_exist_coor(left, right, row_idx, horizontal_coor, 'horizontal')
  File "C:\Users\Administrator\PycharmProjects\pdftomd\pdf2md\pile.py", line 357, in _find_exist_coor
    coor = line_coor[start_idx + span]
IndexError: list index out of range

I works for some pdf files and fails sometimes. I try to figure out what is going on here but nothing comes out from my mind.

chinobing commented 6 years ago

I forget to make cmap before using it. I will give it a try. Close it for now.

chinobing commented 6 years ago

Same error occurs even after making cmap.

chinobing commented 6 years ago

I finally figure out whats going on here. The error was caused by multi-cells in row and column so that it caused the index out of range.