sirfz / tesserocr

A Python wrapper for the tesseract-ocr API
MIT License
2.01k stars 253 forks source link

does not build on current Tesseract anymore #342

Closed bertsky closed 8 months ago

bertsky commented 8 months ago

On current tesserocr:master and tesseract:main with Python 3.8, we now got:

pip3 install ./repo/tesserocr
Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Processing ./repo/tesserocr
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Installing backend dependencies ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: tesserocr
  Building wheel for tesserocr (pyproject.toml) ... error
  error: subprocess-exited-with-error

  × Building wheel for tesserocr (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [25 lines of output]
      Supporting tesseract v5.3.4
      Tesseract major version 5
      Configs from pkg-config: {'library_dirs': ['/ocrd_all/venv38/lib', '/usr/local/lib'], 'include_dirs': ['/venv38/include', '/usr/local/include', '/usr/local/include'], 'libraries': ['tesseract', 'archive', 'curl', 'lept'], 'compile_time_env': {'TESSERACT_MAJOR_VERSION': 5, 'TESSERACT_VERSION': 84083712}}
      running bdist_wheel
      running build
      running build_ext
      Detected compiler: unix
      building 'tesserocr' extension
      creating build
      creating build/temp.linux-x86_64-cpython-38
      x86_64-linux-gnu-gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O2 -Wall -g -fstack-protector-strong -Wformat -Werror=format-security -g -fwrapv -O2 -fPIC -I/venv38/include -I/usr/local/include -I/usr/local/include -I/venv38/include -I/usr/include/python3.8 -c tesserocr.cpp -o build/temp.linux-x86_64-cpython-38/tesserocr.o -std=c++11 -DUSE_STD_NAMESPACE
      In file included from tesserocr.cpp:1274:0:
      /venv38/include/tesseract/renderer.h:143:24: error: ‘AppendData’ function uses ‘auto’ type specifier without trailing return type
         auto AppendData(T &&d) {
                              ^
     /venv38/include/tesseract/renderer.h:143:24: note: deduced return type only available with -std=c++14 or -std=gnu++14
      tesserocr.cpp: In function ‘PyObject* __pyx_pf_9tesserocr_13PyTessBaseAPI_34GetAvailableLanguages(__pyx_obj_9tesserocr_PyTessBaseAPI*)’:
      tesserocr.cpp:26215:35: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
           for (__pyx_t_4 = 0; __pyx_t_4 < __pyx_t_3; __pyx_t_4+=1) {
                               ~~~~~~~~~~^~~~~~~~~~~
      tesserocr.cpp: In function ‘PyObject* __pyx_pf_9tesserocr_12get_languages(PyObject*, PyObject*)’:
      tesserocr.cpp:40045:35: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
           for (__pyx_t_5 = 0; __pyx_t_5 < __pyx_t_4; __pyx_t_5+=1) {
                               ~~~~~~~~~~^~~~~~~~~~~
      error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for tesserocr
Failed to build tesserocr

full log: see this build

I believe, the reason is that Tesseract now requires C++17. Will investigate...

bertsky commented 8 months ago

Confirmed. Since Tesseract 5.3.4, C++17 is required.

So in setup.py we should make another case distinction...

zdenop commented 8 months ago

Are you sure your system is not miss configured and/or outdated? e.g. I did not see python 3.8 for a long time (actually 3.11 is already in bugfix stage). I use tesserocr on windows, opensuse, rasberianpi and I do not remember problem regarding c++17. Anyway #343 does no cause any harm.

bertsky commented 8 months ago

@zdenop this has nothing to do with Python versions, obviously. (And no, Python 3.8 is not EOL just yet.)

As you well know, a C++17 compiler is required with Tesseract now. The Tesserocr build still only used C++11 setting. So even newer compilers would have misbehaved AFAIK.

zdenop commented 8 months ago

@bertsky : I did not write Py3.8 is EOL. I asked if you can check that your system is not misconfigured or outdated.

Tesseract required c++17 for several years. AFAIK -std=c++17 is need only when your code is using c++17 features (as far as I see tesserocr does not)

Your issue claims "does not build on current Tesseract anymore" however github action run 5 months ago (for 2.6.2 release) states something else https://github.com/sirfz/tesserocr/actions/runs/6456160629 (yes also it build wheel for python8 without problem). So either we should improve github action or you should fix your system...

bertsky commented 7 months ago

@zdenop the build status from 5 months ago is irrelevant. Up until a week ago, the CI would install Tesseract 5.3.3. As the issue says, this starts happening with 5.3.4. It's also happening with g++ 11.

@sirfz could you please make a new release on PyPI so installing tesserocr via pip works again?

zdenop commented 7 months ago

There were no changes in tesseract or tesserocr runtime code regarding C++17 vs C++14 in the last months... C++17 code is located in the training tools, which are not used in tesserocr. And yes it is there for more than a year so the build status from 5 months ago regarding your claim is relevant.

sirfz commented 7 months ago

I understand from @zdenop that your issue was unrelated to the C++ version used to build tesserocr as the lib would still successfully compile with C++14.

In any case, I guess the change is welcome and v2.6.3 has been published on Pypi.

zdenop commented 7 months ago

Yes, this change is ok (in line with the current Tesseract build where it is needed).