jlsutherland / doc2text

Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.
MIT License
1.27k stars 97 forks source link

Python 3.5 compatibility #24

Open andjelx opened 7 years ago

andjelx commented 7 years ago

Seems library not 100% python3 compatible. When I'm tying to run simple code:

import doc2text

doc = doc2text.Document()
doc = doc2text.Document(lang="eng")
doc.read('pdf-sample.pdf')

I'm getting

Traceback (most recent call last):
  File "doc2text_test.py", line 13, in <module>
    doc.read('pdf-sample.pdf')
  File "/usr/local/lib/python3.5/dist-packages/doc2text/__init__.py", line 44, in read
    for i in xrange(self.num_pages):
NameError: name 'xrange' is not defined
andjelx commented 7 years ago

Pull request #25 created

neel17 commented 5 years ago

Need to change the code in file init.py Line 44: ` for i in xrange(self.num_pages): '

to

for i in list(range(self.num_pages)):

andjelx commented 5 years ago

@neel17 xrange returns itterator, not list - which is more optimal in terms of mem usage.

neel17 commented 5 years ago

@andjelx xrange is not supported in Python3, what could be the probable work around?

andjelx commented 5 years ago

@neel17 Have u ever checked the PR?

neel17 commented 5 years ago

@andjelx : My regret, your PR works fine, I was trying to solve it and got this issue resolve using the above solution as well. Thanks for making me re-understand it again!