Closed th3ragex closed 5 years ago
Hi @th3ragex,
Thank you for the report. However, the line
from pile import Pile
is not to import the package from https://pypi.org/project/pile/. Instead, it is to import the local file https://github.com/johnlinp/pdf-to-markdown/blob/master/pdf2md/pile.py.
Additionally the pdf2md script has issues with Python 3+
Can you elaborate what issues do you have? Thanks.
Hi @johnlinp,
thank you for your fast reply. I haven't seen Pile is a script of this library, my bad. I managed to get it working 👍
Additionally the pdf2md script has issues with Python 3+
1) pdf2md script has print statements without parenthesis.
print 'Parsing', filename
2) Problems with imports to local files like Pile.py. I am not that experienced in python and imports are still somewhat strange to me. I had to change all imports to relative files:
from parser import Parser
to
from .parser import Parser
Otherwise it seems to struggle finding dependencies.
3) Pile.py @66
piles = sorted(tables + paragraphs + images, reverse=True, key=lambda x: x._get_anything().y0)
Exception has occurred: TypeError: can only concatenate list (not "filter") to list
piles = sorted(tables + list(paragraphs) + images, reverse=True, key=lambda x: x._get_anything().y0)
Hi @th3ragex,
Thank you for the testing. I'll fix #19 when I have free time.
Hi,
i have just tried to get pdf2md running in a Conda Python 3.7.2 environment and got stuck in parser.py imports:
from pile import Pile
https://pypi.org/project/pile/It seems this package is only available for Python 2.7. Additionally the pdf2md script has issues with Python 3+
Please consider adding pile to the dependencies file for pip and state the required runtime it in README.md .
This may save some people from wasting their time with the wrong environment.