tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
630 stars 184 forks source link

make find follow symbolic links #258

Closed bertsky closed 3 years ago

bertsky commented 3 years ago

Tesstrain comes with quite restrictive assumptions on the structure of the data in the filesystem (*.{png,tif} / *.gt.txt pairs under data/NAME-ground-truth below the tesstrain code). But often one needs more flexibility, for example because

There's a simple technique usually employed to that end: symlinks!

However, (GNU and BSD) find by default does not follow directories which are merely symlinks. This PR adds the -L option so that all *.gt.txt files are found below any such (sub)paths.