virantha / pypdfocr

Python script to do PDF OCR conversion using Tesseract
Apache License 2.0
372 stars 114 forks source link

Unable to convert PDF (OSX 10.13 build 17A358a) #65

Open Bobspadger opened 7 years ago

Bobspadger commented 7 years ago

I'm aware I am running this on a Beta OS release but posting here to see if its an already known issue

➜  Epson Connect pypdfocr Epson_03092017170731.pdf
Starting conversion of Epson_03092017170731.pdf
Using 300 DPI
Traceback (most recent call last):
  File "/usr/local/bin/pypdfocr", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python2.7/site-packages/pypdfocr/pypdfocr.py", line 492, in main
    script.go(sys.argv[1:])
  File "/usr/local/lib/python2.7/site-packages/pypdfocr/pypdfocr.py", line 474, in go
    self._convert_and_file_email(self.pdf_filename)
  File "/usr/local/lib/python2.7/site-packages/pypdfocr/pypdfocr.py", line 480, in _convert_and_file_email
    ocr_pdffilename = self.run_conversion(pdf_filename)
  File "/usr/local/lib/python2.7/site-packages/pypdfocr/pypdfocr.py", line 363, in run_conversion
    ocr_pdf_filename = self.pdf.overlay_hocr_pages(img_dpi, hocr_filenames, pdf_filename)
  File "/usr/local/lib/python2.7/site-packages/pypdfocr/pypdfocr_pdf.py", line 166, in overlay_hocr_pages
    orig_pg = self._get_merged_single_page(orig_pg, text_pg)
  File "/usr/local/lib/python2.7/site-packages/pypdfocr/pypdfocr_pdf.py", line 190, in _get_merged_single_page
    orig_rotation_angle = int(original_page.get('/Rotate', 0))
TypeError: int() argument must be a string or a number, not 'IndirectObject'

I can see the intermediate files being created, then it tries to merge, and crashed out, leaving multiple blank PDF's

jasonbarbee commented 6 years ago

I'm also having this problem at the same line of code. I had to manually change the line orig_rotation_angle = 0 that bypasses this problem code at whatever angle. I don't see any problems with my rotations, but it put everything back together ok after I modified /usr/local/lib/python2.7/site-packages/pypdfocr/pypdfocr_pdf.py OSX 10.13.3 High Sierra non beta release.