johnlinp / pdf-to-markdown

Convert PDF files into markdown files
BSD 3-Clause "New" or "Revised" License
285 stars 69 forks source link

ModuleNotFoundError: Pile / Python 2.7 dependency #17

Closed th3ragex closed 5 years ago

th3ragex commented 5 years ago

Hi,

i have just tried to get pdf2md running in a Conda Python 3.7.2 environment and got stuck in parser.py imports:

from pile import Pile https://pypi.org/project/pile/

It seems this package is only available for Python 2.7. Additionally the pdf2md script has issues with Python 3+

Please consider adding pile to the dependencies file for pip and state the required runtime it in README.md .

This may save some people from wasting their time with the wrong environment.

johnlinp commented 5 years ago

Hi @th3ragex,

Thank you for the report. However, the line

from pile import Pile

is not to import the package from https://pypi.org/project/pile/. Instead, it is to import the local file https://github.com/johnlinp/pdf-to-markdown/blob/master/pdf2md/pile.py.

Additionally the pdf2md script has issues with Python 3+

Can you elaborate what issues do you have? Thanks.

th3ragex commented 5 years ago

Hi @johnlinp,

thank you for your fast reply. I haven't seen Pile is a script of this library, my bad. I managed to get it working 👍

Additionally the pdf2md script has issues with Python 3+

1) pdf2md script has print statements without parenthesis. print 'Parsing', filename

2) Problems with imports to local files like Pile.py. I am not that experienced in python and imports are still somewhat strange to me. I had to change all imports to relative files: from parser import Parser to from .parser import Parser

Otherwise it seems to struggle finding dependencies.

3) Pile.py @66 piles = sorted(tables + paragraphs + images, reverse=True, key=lambda x: x._get_anything().y0)

Exception has occurred: TypeError: can only concatenate list (not "filter") to list

piles = sorted(tables + list(paragraphs) + images, reverse=True, key=lambda x: x._get_anything().y0)

johnlinp commented 5 years ago

Hi @th3ragex,

Thank you for the testing. I'll fix #19 when I have free time.