Closed betaboon closed 1 year ago
@zdenop i think i addressed all your review comments now.
I like your build scripts, but I am not sure if there should be part of tesserocr. Especially if not experienced user would use them. Here are some comments from perspective of tesserocr builder/user:
pango
, cairo
and icu4c
(installed for MacOS) are needed only for training tools - tesserocr is not able to use them e.g. scripts (also for linux) should build tessaract without training tools if you build it only for tesserocrRegarding GitHub action part - this something I love to see in tesserocr (with windows wheels - but that is more difficult part).
first off: thanks for taking the time to look at this.
I like your build scripts, but I am not sure if there should be part of tesserocr. Especially if not experienced user would use them.
as they are only intended to be used for the wheel builds in ci, we could move them to .github
so that they are not considered for "public consumption".
1. On linux and Mac you should first check if tesseract (any probably leptonica) is not already installed. if yes then it should be uninstalled first before building tesseract/leptonica to avoid further problems. 2. If your script should be used for creating tesserocr python package/wheel - you should stick to system provided leptonica/tesseract (API/ABI of different version are not the same and programs linked against shared libraries could be broken if different version is present). Other solution could be that you include your custom build shared libraries to tesseocr wheel as [it is done for windows](https://github.com/simonflueckiger/tesserocr-windows_build/releases)
do i understand this correctly, that these points are only relevant when users would attempt to use the build-scripts locally?
3. `pango`, `cairo` and `icu4c` (installed for MacOS) are needed only for training tools - tesserocr is not able to use them e.g. scripts (also for linux) should build tessaract without training tools if you build it only for tesserocr
do i get this right: you're suggesting to compile tesseract with -DBUILD_TRAINING_TOOLS=OFF
and removing those dependencies?
4. Personally I prefer to build leptonica with zlib and png support (AKA minimalistic build) - tesserocr is used for OCR, and python is able to open rest of image formats (with PIL or OpenCV). This decrease tesserocr dependency complexity...
do i get this right: you're suggesting to to compile leptonic with -DENABLE_GIF=OFF
, -DENABLE_JPEG=OFF
, -DENABLE_TIFF=OFF
, -DENABLE_WEBP=OFF
and -DENABLE_OPENJPEG=OFF
and removing those dependencies?
For building release wheels use tesseract and leptonica provided by system.
If you plan to introduce testing CI (e.g. for commits, PR) then integrate those scripts to github actions as tesseract and leptonica do.
do i get this right: you're suggesting to compile tesseract with -DBUILD_TRAINING_TOOLS=OFF and removing those dependencies?
yes
do i get this right: you're suggesting to to compile leptonic with -DENABLE_GIF=OFF, -DENABLE_JPEG=OFF, -DENABLE_TIFF=OFF, -DENABLE_WEBP=OFF and -DENABLE_OPENJPEG=OFF and removing those dependencies?
yes.
so i just pushed changes doing the following:
.github
as to signify that they are only being used for ci-DBUILD_TRAINING_TOOLS=OFF
and removing the appropriate dependenciesis there anything left for me to do here to get this merged?
@sirfz thanks for merging.
quick question: when can we expect the wheels to be available on pypi?
hi @betaboon, I'll include the wheels with the next release as I can't modify an existing release on pypi
do you have any idea when that might be ?
resolves #123
Hello.
This PR adds a github-workflow to create wheels containing all the required libraries. builds include:
This now builds
leptonica
andtesseract
from source.~~some further information tho: since
tesseract
andleptonica
are being installed viayum
(formanylinux
),apk
(formusllinux
) andbrew
(formacos
) all of the built wheels contain a different version of tesseract. that might pose a problem to some. i think the only way to get around that would be to compile tesseract from source.~~