Closed risicle closed 1 month ago
qt-box-editor was IMO relevant for tesseract 3.x training (legacy engine) and it does not provide any value for the current tesseract version... So what is the value if it is possible to build with the latest version of leptonica&tesseract?
Simply that older leptonica versions have security vulnerabilities meaning we (NixOS) can't ship them.
Perhaps this is an indication that we should just drop the qt-box-editor package, but as long as it's relatively straightforward to keep it building, we probably will do so with patches.
It is not problem to include patch here, I just wander if really people are actively using this.
Yes. The current version of Tesseract still supports the OCR-based engine. The LSTM model takes significantly longer to train, according to the Tesseract documentation itself.
LSTM engine does not need to be trained from scratch (legacy engine has to). E.g. you can train and extend only problems. IMO LSTM training is (could be) faster as you do not need to take care about bounding boxes of letters and training based on tutorials like this seem to be pretty easy.
Anyway I made requested changes of QTB code.
Unfortunately LSTM doesn't seem to work well on matching basic monospace without word recognition.
fixed.
Leptonica 1.83 moved a number of
struct
definitions into "private" headers, notablyPix
andBox
et al.This causes a build failure:
To address this, an extra import of
<leptonica/pix_internal.h>
needs to be added tosrc/TessTools.h
.On top of this, it looks like this version got rid of the library's
lept
alias, so references to-llept
inqt-box-editor.pro
need to be switched to-lleptonica
.