Closed william8000 closed 5 years ago
When processing a PDF on Fedora 20 Linux, I got
Parsing test.pdf Traceback (most recent call last): File "main.py", line 30, in <module> main(sys.argv) File "main.py", line 17, in main piles = parser.parse() File "/tmp/pdf/pdf-to-markdown-13jul15/pdf2md/parser.py", line 36, in parse piles += self._parse_page(page) File "/tmp/pdf/pdf-to-markdown-13jul15/pdf2md/parser.py", line 60, in _parse_page pile.parse_layout(page) File "/tmp/pdf/pdf-to-markdown-13jul15/pdf2md/pile.py", line 52, in parse_layout assert False, "Unrecognized type: %s" % type(obj) AssertionError: Unrecognized type: <class 'pdfminer.layout.LTLine'>
I got around it with the patch below.
--- pdf-to-markdown-13jul15/pdf2md/pile.py- 2015-07-13 11:31:43.000000000 -0400 +++ pdf-to-markdown-13jul15/pdf2md/pile.py 2015-07-13 12:17:47.143587827 -0400 @@ -49,7 +49,8 @@ elif type(obj) == LTCurve: pass else: - assert False, "Unrecognized type: %s" % type(obj) + print "Unrecognized type: " + str(type(obj)) + # assert False, "Unrecognized type: %s" % type(obj) def split_piles(self):
Thank you for bring it up. I will try to fix this soon.
This is already fixed by #4. Sorry for the late reply.
Thanks!
When processing a PDF on Fedora 20 Linux, I got
I got around it with the patch below.