hrbrmstr / docxtractr

:scissors: Extract Tables from Microsoft Word Documents with R
Other
174 stars 29 forks source link

convert_to_pdf() fails but command-line equivalent works #35

Closed dtenenba closed 1 year ago

dtenenba commented 1 year ago

Hello, I have a pptx file called slides.pptx that I want to convert to PDF using docxtractr.

When I try, I get this:

> docxtractr::convert_to_pdf("/tmp/slides.pptx")
Warning: failed to launch javaldx - java may not function correctly
The application cannot be started.
The component manager is not available.
("Cannot open uno ini file:///usr/lib/x86_64-linux-gnu/unorc at ./cppuhelper/source/defaultbootstrap.cxx:53")
Error in docxtractr::convert_to_pdf("/tmp/slides.pptx") :
  Conversion from PPTX to PDF did not succeed
In addition: Warning message:
In system(cmd, intern = TRUE) :
  running command '"/usr/bin/soffice" --convert-to pdf --headless --outdir "/tmp/RtmptPPhMp" "/tmp/RtmptPPhMp/file151a369f9d98.pptx"' had status 139

However, when I take that soffice command line at the end, and change the last parameter to /tmp/slides.pptx, it works (despite throwing a warning). It produces the output PDF in /tmp and I verified that its contents are correct:

root@4d9dd60d0e79:/app# "/usr/bin/soffice" --convert-to pdf --headless --outdir "/tmp/" "/tmp/slides.pptx"
Warning: failed to launch javaldx - java may not function correctly
convert /tmp/slides.pptx -> /tmp/slides.pdf using filter : impress_pdf_Export
root@4d9dd60d0e79:/app#

So, you might ask, why don't I just use the command line. Well, this is part of a larger software stack that relies on docxtractr, and I don't want to reinvent the wheel.

This is inside a Docker container based on Debian 12, with R-4.2.2 and docxtractr 0.6.5.

BTW, I do not have the javaldx program in the container (although I installed libreoffice via apt-get) but it does not seem to matter - despite the warning, soffice converts the pptx successfully.

So in a nutshell I am wondering why I can successfully convert pptx to pdf on the command line but not with docxtractr which apparently uses pretty much the same command line under the hood.

Thanks

dtenenba commented 1 year ago

Solved this...R added a bunch of stuff to my LD_LIBRARY_PATH. Running this before trying the conversion made it work:

Sys.setenv(LD_LIBRARY_PATH="/usr/lib/libreoffice/program//")
howardbaik commented 1 year ago

Maybe related to https://github.com/hrbrmstr/docxtractr/issues/29