jlsutherland / doc2text

Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.
MIT License
1.27k stars 98 forks source link

AttributeError: 'Page' object has no attribute 'image' ISSUE #30

Closed angelo337 closed 6 years ago

angelo337 commented 6 years ago

hi there I am testing your product, however I am getting this type of error:

Error in /usr/local/lib/python2.7/dist-packages/doc2text/page.py on line 25 dst is not a numpy array, neither a scalar Error in /usr/local/lib/python2.7/dist-packages/doc2text/page.py on line 211 dst is not a numpy array, neither a scalar Error in /usr/local/lib/python2.7/dist-packages/doc2text/page.py on line 80 dst is not a numpy array, neither a scalar Traceback (most recent call last): File "example_doc2text.py", line 19, in doc.extract_text() File "/usr/local/lib/python2.7/dist-packages/doc2text/init.py", line 96, in extract_text text = new.extract_text() File "/usr/local/lib/python2.7/dist-packages/doc2text/page.py", line 46, in extract_text cv2.imwrite(temp_path, self.image) AttributeError: 'Page' object has no attribute 'image'

my test files is as follow:

> import doc2text
> 
> # Initialize the class.
> doc = doc2text.Document()
> 
> # You can pass the lang (as 3 letters code) to the class to improve accuracy
> # On ubuntu it requires the package tesseract-ocr-$lang$
> # On other OS, see https://github.com/tesseract-ocr/langdata
> doc = doc2text.Document(lang="eng")
> 
> # Read the file in. Currently accepts pdf, png, jpg, bmp, tiff.
> # If reading a PDF, doc2text will split the PDF into its component pages.
> doc.read('myfile.tiff')
> 
> # Crop the pages down to estimated text regions, deskew, and optimize for OCR.
> doc.process()
> 
> # Extract text from the pages.
> doc.extract_text()
> text = doc.get_text()
> print text

could you please help me? thanks a lot

angelo337 commented 6 years ago

I figure it out: in order to solve this issue on Ubuntu you should run:

pip install opencv-contrib-python

in order to upgrade to 3.x opencv to check the version of OPENCV you can run:

python -c "import cv2; print cv2.version