izderadicka / pdfparser

Python binding to libpoppler with focus on text extraction
98 stars 46 forks source link

Mac installation #7

Closed luizv closed 6 years ago

luizv commented 6 years ago

First, congrats on your project. I read this and I was amazed by your results and benchmarks.

I'm new to python and binding libraries, It's possible run this on mac? I got cython & poppler from homebrew with all the necessary dependences.

But it seems pdfparser doesn't find important things needed at poppler files. Any recomendation to make it work?


The install log

Some errors from running pip install git+https://github.com/izderadicka/pdfparser:

/usr/local/Cellar/poppler/0.60.1/include/poppler/Object.h:326:50: error: no member named 'move' in namespace 'std'
      { OBJECT_TYPE_CHECK(objArray); array->add(std::move(elem)); }

[...]

/usr/local/Cellar/poppler/0.60.1/include/poppler/Stream.h:238:95: error: use of undeclared identifier 'nullptr'
      Stream *makeFilter(char *name, Stream *str, Object *params, int recursion = 0, Dict *dict = nullptr);

[...]

    /usr/local/Cellar/poppler/0.60.1/include/poppler/Page.h:113:23: error: no member named 'move' in namespace 'std'
      {  resources = std::move(obj1); 
/usr/local/Cellar/poppler/0.60.1/include/poppler/Page.h:186:40: error: use of undeclared identifier 'nullptr'
      Object getAnnotsObject(XRef *xrefA = nullptr) { return annotsObj.fetch(xrefA ? xrefA : xref); }

[...]

    /usr/local/Cellar/poppler/0.60.1/include/poppler/PDFDoc.h:295:153: error: use of undeclared identifier 'nullptr'
      void markPageObjects(Dict *pageDict, XRef *xRef, XRef *countRef, Guint numOffset, int oldRefNum, int newRefNum, std::set<Dict*> *alreadyMarkedDicts = nullptr);
                                                                                                                                                            ^
    /usr/local/Cellar/poppler/0.60.1/include/poppler/PDFDoc.h:296:156: error: use of undeclared identifier 'nullptr'
      GBool markAnnotations(Object *annots, XRef *xRef, XRef *countRef, Guint numOffset, int oldPageNum, int newPageNum, std::set<Dict*> *alreadyMarkedDicts = nullptr);
                                                                                                                                                               ^
    /usr/local/Cellar/poppler/0.60.1/include/poppler/PDFDoc.h:301:135: error: use of undeclared identifier 'nullptr'
                               CryptAlgorithm encAlgorithm, int keyLength, int objNum, int objGen, std::set<Dict*> *alreadyWrittenDicts = nullptr);
                                                                                                                                          ^
    /usr/local/Cellar/poppler/0.60.1/include/poppler/PDFDoc.h:314:146: error: use of undeclared identifier 'nullptr'
      void markObject (Object *obj, XRef *xRef, XRef *countRef, Guint numOffset, int oldRefNum, int newRefNum, std::set<Dict*> *alreadyMarkedDicts = nullptr);
                                                                                                                                                     ^
    /usr/local/Cellar/poppler/0.60.1/include/poppler/PDFDoc.h:323:99: error: use of undeclared identifier 'nullptr'
                        int keyLength, int objNum, int objGen, std::set<Dict*> *alreadyWrittenDicts = nullptr)
                                                                                                      ^

[...]

                                 ^
    /usr/local/Cellar/poppler/0.60.1/include/poppler/TextOutputDev.h:801:32: warning: 'override' keyword is a C++11 extension [-Wc++11-extensions]
      void eoFill(GfxState *state) override;
                                   ^
    /usr/local/Cellar/poppler/0.60.1/include/poppler/TextOutputDev.h:804:37: warning: 'override' keyword is a C++11 extension [-Wc++11-extensions]
      void processLink(AnnotLink *link) override;
                                        ^
    pdfparser/poppler.cpp:751:11: warning: 'likely' macro redefined [-Wmacro-redefined]
      #define likely(x)   __builtin_expect(!!(x), 1)
              ^
    /usr/local/Cellar/poppler/0.60.1/include/poppler/goo/GooLikely.h:15:10: note: previous definition is here
    # define likely(x)      __builtin_expect((x), 1)
             ^
    pdfparser/poppler.cpp:752:11: warning: 'unlikely' macro redefined [-Wmacro-redefined]
      #define unlikely(x) __builtin_expect(!!(x), 0)
              ^
    /usr/local/Cellar/poppler/0.60.1/include/poppler/goo/GooLikely.h:16:10: note: previous definition is here
    # define unlikely(x)    __builtin_expect((x), 0)
             ^
    446 warnings and 11 errors generated.
    error: command '/usr/bin/clang' failed with exit status 1

    ----------------------------------------
Command "/Library/Frameworks/Python.framework/Versions/3.4/bin/python3.4 -u -c "import setuptools, tokenize;__file__='/private/var/folders/ws/v4schn594891ppp5wq7t_jy40000gp/T/pip-82kgkscn-build/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /var/folders/ws/v4schn594891ppp5wq7t_jy40000gp/T/pip-d6sucitt-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /private/var/folders/ws/v4schn594891ppp5wq7t_jy40000gp/T/pip-82kgkscn-build/
izderadicka commented 6 years ago

Looks like problems with problems in C++ compiler rather then python -
what C++ compiler are you using? - I tested only on linux with GNU C++ . If you are using XCode than assure that it supports correct standard of C++ - look here https://stackoverflow.com/questions/40076434/no-member-named-move-in-namespace-std

I. n

luizv commented 6 years ago

Thank you for pointing me to the right direction. After many tries I managed to make it work.

Feedback I tried to change the compiler to g++ when installing pdfparser through pip. I tested many gcc/g++ versions, set compiling CFLAGS and a lot of stuffs. Each gave me different errors. After a while I saw an specific error saying my Python architecture was i386, and poppler was built x86_84.

I failed to grasp how I could change Python preferred architecture, but then I installed another Python interpreter with brew, and with brew's pip the pdfparser install works.

I'll investigate now how to fix my other Python installs to run pdfparser, but it's definitely not a pdfparser issue. 😅 I learned a lot at this process. Thank you again for your help. 👍