johnlinp / pdf-to-markdown

Convert PDF files into markdown files
BSD 3-Clause "New" or "Revised" License
284 stars 70 forks source link

AssertionError: Unrecognized type: <class 'pdfminer.layout.LTLine'> #2

Closed william8000 closed 5 years ago

william8000 commented 9 years ago

When processing a PDF on Fedora 20 Linux, I got

Parsing test.pdf
Traceback (most recent call last):
  File "main.py", line 30, in <module>
    main(sys.argv)
  File "main.py", line 17, in main
    piles = parser.parse()
  File "/tmp/pdf/pdf-to-markdown-13jul15/pdf2md/parser.py", line 36, in parse
    piles += self._parse_page(page)
  File "/tmp/pdf/pdf-to-markdown-13jul15/pdf2md/parser.py", line 60, in _parse_page
    pile.parse_layout(page)
  File "/tmp/pdf/pdf-to-markdown-13jul15/pdf2md/pile.py", line 52, in parse_layout
    assert False, "Unrecognized type: %s" % type(obj)
AssertionError: Unrecognized type: <class 'pdfminer.layout.LTLine'>

I got around it with the patch below.

--- pdf-to-markdown-13jul15/pdf2md/pile.py-     2015-07-13 11:31:43.000000000 -0400
+++ pdf-to-markdown-13jul15/pdf2md/pile.py      2015-07-13 12:17:47.143587827 -0400
@@ -49,7 +49,8 @@
                        elif type(obj) == LTCurve:
                                pass
                        else:
-                               assert False, "Unrecognized type: %s" % type(obj)
+                               print "Unrecognized type: " + str(type(obj))
+                               # assert False, "Unrecognized type: %s" % type(obj)

        def split_piles(self):
johnlinp commented 9 years ago

Thank you for bring it up. I will try to fix this soon.

johnlinp commented 5 years ago

This is already fixed by #4. Sorry for the late reply.

william8000 commented 5 years ago

Thanks!