izderadicka / pdfparser

Python binding to libpoppler with focus on text extraction
98 stars 46 forks source link

Python 3 - problem when file name is str, not bytes #19

Open manish59 opened 6 years ago

manish59 commented 6 years ago

import pdfparser.poppler as pdf

pdf.Document(r"Manish.pdf") Traceback (most recent call last): File "", line 1, in File "pdfparser/poppler.pyx", line 116, in pdfparser.poppler.Document.cinit TypeError: expected bytes, str found

Its in redhat Can anyone help to fix this please.

izderadicka commented 6 years ago

I'm assuming you're using python 3, right? - please confirm version. Currently file name is char*, which means bytes in python3 - so you should use pdf.Document(b"Manish.pdf")

In future versions we should improve interface to accept also strings in Python 3.

manish59 commented 6 years ago

Yes Im using in python. Its working. But the rgb values which Im getting are like this r:0.89 g:0.42, b:0.04 but when i check its not showing the same color i need. Do I need to do any thing here to get actucal rgb values in range of 0-255

manish59 commented 6 years ago

Actually I figured the color space. You can close this issue. Thanks for helping though. Can we detect any hyperlinks using this tool

manish59 commented 6 years ago

Can we use this library in windows ?

izderadicka commented 6 years ago

See #17 - theoretically yes, but requires advanced win,python, c++ skills.

I'm keeping this open as there is potential improvement for python 3.

manish59 commented 6 years ago

can we detect hyperlinks using this tool