Closed VictorZuanazzi closed 3 years ago
In case someone else has a similar issue, we found a work around :
TLDR: download the version 2.0.23 wget -r --level 1 https://archive.apache.org/dist/pdfbox/2.0.23/
and set `os.environ['PDFBOX'] = './archive.apache.org/dist/pdfbox/2.0.23/pdfbox-app-2.0.23.jar'
Long version:
python-pdfbox
defaults to the latest pdfbox java app if none is given. However, PDFBox had a major release that broke the CLI interface python-pdfbox
uses. The releases can be found here: https://archive.apache.org/dist/pdfbox/?C=M;O=D
To fix the issue, it is necessary to point python-pdfbox
to the version that was working. To do so we have to download it:
mkdir pdfbox
cd pdfbox
wget -r --level 1 https://archive.apache.org/dist/pdfbox/2.0.23/
Then you have to set create the environment variable PDFBOX
. A hacky way of doing that is by adding this line on top of the first python file to be executed:
`os.environ['PDFBOX']='./pdfbox/archive.apache.org/dist/pdfbox/2.0.23/pdfbox-app-2.0.23.jar'
Hope that helps for while the library is not updated to work with PDFBOX 3.0
The command line options of the PDFBox app have changed in version 3.0 - python-pdfbox needs to be updated to be able to handle the new interface. As a temporary fix, revert the pdfbox-app-*.jar
file downloaded by python-pdfbox to an earlier version - you can find the path to the jar file using
import pdfbox
p = pdfbox.PDFBox()
print(p.pdfbox_path)
Uploaded updated version that only downloads PDFBox 2.*.
I am getting some weird new error with pdfbox. That was working fine until today at 1000 (Amsterdam time)
That is how I call it:
And the pdbfox returns an CLI error:
Anyone had a similar issue? How to solve it?