Open kloczek opened 4 years ago
The dist tar ball does not include third party code which is needed for make check
. Use the code from git if you want to run the checks.
So it would be good to add those files to EXTRA_DIST variable.
Issue is that for now it is not possible to execute test suite using only dist tar balls.
Test suite requires more external files, which are not (and never will be) distributed by tesseract. See https://github.com/tesseract-ocr/tesseract/blob/4.1/unittest/README.md
Nevertheless it would be good to add something. Sometimes it is about "it is better than nothing".
I don't think that it would be reasonable to add files needed for make check
to the distribution. Most people don't need them, so for those people those files would be simply unnecessary and cost bandwidth and disk space.
As @zdenop already wrote make check
requires not only external code but also more files which cannot be added, for example because of license issues.
A more realistic solution would be enhancing the build rules. make check
should either fail with a user friendly message if required files are missing, or it could try to add missing files.
@kloczek, a patch for that would be welcome. I'm afraid the few volunteers who currently try to improve Tesseract have more important priorities. That's at least true for me.
So it would be good to add those files to EXTRA_DIST variable. Issue is that for now it is not possible to execute test suite using only dist tar balls.
@kloczek, just to clarify: Are you talking about dist files made by Makefile? Then EXTRA_DIST
could be used, but Tesseract does not offer such files. Or are you talking about dist files made by GitHub and provided at https://github.com/tesseract-ocr/tesseract/releases? As far as I know those files don't use EXTRA_DIST
.
I don't think that it would be reasonable to add files needed for make check to the distribution.
Almost all source dist tar balls are distributed with all necessary files to be able execute test suite.
Execution of that part is very important part of the packaging processes because it allow more or less confirm that just build binaries are OK. Usually that test suite as well are stressing many components of the build env. In other words remove some parts of the test suite which are causing that test suite is useless sooner or later will backfire project.
Really please understand that this is important.
I see two separated issues: 1) using local copy of gtest and gmock 2) missing files in /test
For first one I just made quick patch only to prove that there is no reasons to use copy of the gtest and gmock and test unit binaries can be linked against system libgtest and libgmock shared libraries. Below patch is not finished because it is still necessary to add proper pkgconfig parts about detecting those libraries in configure.ac
and CMakeLists.txt
.
Nevertheless there is no any technical reasons to prop[agate googletest files in tesseract
--- a/unittest/Makefile.am~ 2019-12-26 14:21:51.000000000 +0000
+++ b/unittest/Makefile.am 2021-04-19 10:06:35.912844153 +0100
@@ -51,14 +51,10 @@
endif # TENSORFLOW
# Build googletest:
-check_LTLIBRARIES = libgtest.la libgtest_main.la libgmock.la libgmock_main.la
-libgtest_la_SOURCES = ../googletest/googletest/src/gtest-all.cc
-libgtest_la_CPPFLAGS = -I$(top_srcdir)/googletest/googletest/include -I$(top_srcdir)/googletest/googletest -pthread
-libgtest_main_la_SOURCES = ../googletest/googletest/src/gtest_main.cc
## libgtest_main_la_LIBADD = libgtest.la
# Build Abseil (needed for some unit tests).
-check_LTLIBRARIES += libabseil.la
+check_LTLIBRARIES = libabseil.la
libabseil_la_SOURCES =
libabseil_la_SOURCES += ../abseil/absl/base/internal/cycleclock.cc
libabseil_la_SOURCES += ../abseil/absl/base/internal/raw_logging.cc
@@ -87,30 +83,16 @@
libabseil_la_SOURCES += ../abseil/absl/time/clock.cc
libabseil_la_SOURCES += ../abseil/absl/time/duration.cc
-GMOCK_INCLUDES = -I$(top_srcdir)/googletest/googlemock/include \
- -I$(top_srcdir)/googletest/googlemock \
- -I$(top_srcdir)/googletest/googletest/include \
- -I$(top_srcdir)/googletest/googletest
-
-libgmock_la_SOURCES = ../googletest/googlemock/src/gmock-all.cc
-libgmock_la_CPPFLAGS = $(GMOCK_INCLUDES) \
- -pthread
-libgmock_main_la_SOURCES = ../googletest/googlemock/src/gmock_main.cc
-libgmock_main_la_CPPFLAGS = $(GMOCK_INCLUDES) \
- -pthread
-
# Build unittests
ABSEIL_LIBS = libabseil.la
-GTEST_LIBS = libgtest.la libgtest_main.la -lpthread
-GMOCK_LIBS = libgmock.la libgmock_main.la
+GTEST_LIBS = -lgtest -lgtest_main -lpthread
+GMOCK_LIBS = -lgmock
TESS_LIBS = $(GTEST_LIBS)
TESS_LIBS += $(top_builddir)/src/api/libtesseract.la $(libarchive_LIBS)
TESS_LIBS += $(TENSORFLOW_LIBS)
TRAINING_LIBS = $(top_builddir)/src/training/libtesseract_training.la
TRAINING_LIBS += $(top_builddir)/src/training/libtesseract_tessopt.la
TRAINING_LIBS += $(TESS_LIBS)
-AM_CPPFLAGS += -isystem $(top_srcdir)/googletest/googletest/include \
- -isystem $(top_srcdir)/googletest/googlemock/include
check_PROGRAMS = apiexample_test
if ENABLE_TRAINING
If are happy about above I can try to finish configure.ac
and CMakeLists.txt
and submit that as PR.
Second thing is merge tesseract repo with test files subrepo because even if with above patch I was able to compile everything test suite is failing because missing content of the test/
make[1]: Entering directory '/home/tkloczko/rpmbuild/BUILD/tesseract-4.1.1/unittest'
make apiexample_test applybox_test baseapi_test bitvector_test cleanapi_test colpartition_test commandlineflags_test dawg_test denorm_test fileio_test heap_test imagedata_test indexmapbidi_test intfeaturemap_test intsimdmatrix_test lang_model_test layout_test ligature_table_test linlsq_test lstm_recode_test lstm_squashed_test lstm_test lstmtrainer_test loadlang_test mastertrainer_test matrix_test normstrngs_test nthitem_test osd_test pagesegmode_test paragraphs_test params_model_test progress_test qrsequence_test recodebeam_test rect_test resultiterator_test scanutils_test shapetable_test stats_test stringrenderer_test tablefind_test tablerecog_test tabvector_test textlineprojection_test tfile_test unichar_test unicharcompress_test unicharset_test validate_grapheme_test validate_indic_test validate_khmer_test validate_myanmar_test validator_test libabseil.la
make[2]: Entering directory '/home/tkloczko/rpmbuild/BUILD/tesseract-4.1.1/unittest'
make[2]: stat: /home/tkloczko/rpmbuild/BUILD/tesseract-4.1.1/test/testing/phototest.tif: Too many levels of symbolic links
mkdir -p ../test/testing
ln -s /home/tkloczko/rpmbuild/BUILD/tesseract-4.1.1/test/testing/phototest.tif ../test/testing/phototest.tif
ln: failed to create symbolic link '../test/testing/phototest.tif': File exists
make[2]: *** [Makefile:2878: /home/tkloczko/rpmbuild/BUILD/tesseract-4.1.1/test/testing/phototest.tif] Error 1
make[2]: Leaving directory '/home/tkloczko/rpmbuild/BUILD/tesseract-4.1.1/unittest'
make[1]: *** [Makefile:2548: check-am] Error 2
make[1]: Leaving directory '/home/tkloczko/rpmbuild/BUILD/tesseract-4.1.1/unittest'
make: *** [Makefile:505: check-recursive] Error 1
Almost all source dist tar balls are distributed with all necessary files to be able execute test suite.
The more complex software packages which I know don't include all files which are required for the test suite in their source dist tar ball, simply because that would make that tar ball too large.
I know that recent Linux distributions can at least provide googletest
, but they still don't include abseil
. In addition to those sources, a lot of other files are required for make check
. Replacing the well defined revision of googletest
by a distribution package does not solve the problem, but creates a new one because there might be compatiblity issues and for other platforms like Windows we don't have such packages.
The better way to go is fixing the build rules to get all missing required files for unit tests from online sources. The Git submodules can be initialized automatically if they are missing, font files can be fetched, model files can be fetched, too.
Issue is that for now it is not possible to execute test suite using only dist tar balls.
That is possible. It only requires some git clone
and wget
or curl
commands. Feel free to add the necessary documentation and/or a script which does the necessary steps and send a pull request for 4.1. Git master already contains more build code to do that automatically, but even there some steps are still missing.
Issue is that many build envs (including mine) are deliberately) are cut of from public network. Just checked all my src.rpm packages I found that many packages are using 0.5-0.8 GB of compressed source tar balls. Really .. as "big" is term of so called fuzzy logic and exact value of that term depends only on some reference value :) Usually separation of some test data is only because that those files are using different license.
For example OpenCV is using separated tar ball with test data however those data are kept in separated repo which is always synchronously tagged with the same git tag on release as main OpenCV repo and all what you need is just unpack test data in OpenCV source tree.
I know that recent Linux distributions can at least provide googletest, but they still don't include abseil
Where is that project? (cannot find it).
If it provides .pc
file all what you need is just star requiring that in ac/cmake. People will find it and package as everything else :)
Abseil does not provide a .pc
file, but you can download a source tar ball (like for Googletest, too).
Comment from issue #3673:
5.4.1 has one issue. It uses bundled googletest included in source tree as submodules. Why not use system installed gtest? 🤔 All distros provides gtest ..
At least for macOS (Homebrew) and for Windows I don't know gtest packages which could be used. Therefore we must support a local solution like the current one with a Git submodule. Additional support for gtest from a Linux distribution is possible, but needs someone who implements it and sends a pull request.
Looks like some files are missing in dist tar ball.