izderadicka / pdfparser

Python binding to libpoppler with focus on text extraction
97 stars 45 forks source link

Mojave build fix + use c_str (poppler 0.74.0 GooString) #24

Closed DainisGorbunovs closed 4 years ago

DainisGorbunovs commented 5 years ago

I am using macOS Mojave and was unable to build this package. This pull request fixes the two issues - getCString error and compiler flag issue in Mojave.

Reproduction

conda create -n myenv python=3.7 cython
conda activate myenv
pip install git+https://github.com/izderadicka/pdfparser

getCString error (goo/GooString)

First error is about getCString, see:

  pdfparser/poppler.cpp:8811:41: error: no member named 'getCString' in 'GooString'
          __pyx_t_12 = __pyx_v_font_name->getCString();
                       ~~~~~~~~~~~~~~~~~  ^
  pdfparser/poppler.cpp:8910:29: error: no member named 'getCString' in 'GooString'
      __pyx_t_12 = __pyx_v_s->getCString();
                   ~~~~~~~~~  ^

Solution

If I check the header file, then I see that for poppler 0.74.0 there is no getCString, and instead c_str() should be used. The proposed code changes should be backwards compatible with older Poppler versions - in the code I used conditional compilation with variable USE_CSTRING in Cython.

File: /usr/local/Cellar/poppler/0.74.0/include/poppler/goo/GooString.h

This change was in Poppler 0.72.0, see https://gitlab.freedesktop.org/poppler/poppler/commit/817b0f12453985c416a0388cdd4a09697d092b7f

macOS Mojave issue with libc++ CFLAG

When poppler config contains only extra_compile_args, they don't get propagated to the g++:

poppler_config.setdefault('extra_compile_args', []).extend(mac_compile_args)

I get the error:

g++ -bundle -undefined dynamic_lookup -L/Users/myname/miniconda3/envs/myenv/lib -arch x86_64 -L/Users/myname/miniconda3/envs/myenv/lib -arch x86_64 -arch x86_64 build/temp.macosx-10.7-x86_64-3.7/pdfparser/poppler.o -L/usr/local/Cellar/poppler/0.74.0/lib -L/usr/local/Cellar/poppler/0.74.0/lib -lpoppler -lpoppler-cpp -o build/lib.macosx-10.7-x86_64-3.7/pdfparser/poppler.cpython-37m-darwin.so
clang: warning: libstdc++ is deprecated; move to libc++ with a minimum deployment target of OS X 10.9 [-Wdeprecated]
ld: library not found for -lstdc++
clang: error: linker command failed with exit code 1 (use -v to see invocation)
error: command 'g++' failed with exit status 1

Solution

Add extra_link_args, then g++ command will contain -std=c++11 -stdlib=libc++ -mmacosx-version-min=10.7 flags at the end:

poppler_config.setdefault('extra_compile_args', []).extend(mac_compile_args)
poppler_config.setdefault('extra_link_args', []).extend(mac_compile_args)

There is also a workaround by just specifying the compiler flag in the command line:

CFLAGS=-stdlib=libc++ python3 setup.py sdist bdist_wheel
izderadicka commented 4 years ago

Thanks, interesting that they decided to introduce such breaking change - see this discussion https://gitlab.freedesktop.org/poppler/poppler/merge_requests/112 I.