Tesstrain comes with quite restrictive assumptions on the structure of the data in the filesystem (*.{png,tif} / *.gt.txt pairs under data/NAME-ground-truth below the tesstrain code). But often one needs more flexibility, for example because
the data is in a git repo (but so is tesstrain),
some mountpoints are faster/larger than others, or the data is simply scattered across locations, or
you want to (re)group data files in certain ways (e.g. clusters of books by age / material / lang / font)
There's a simple technique usually employed to that end: symlinks!
However, (GNU and BSD) find by default does not follow directories which are merely symlinks. This PR adds the -L option so that all *.gt.txt files are found below any such (sub)paths.
Tesstrain comes with quite restrictive assumptions on the structure of the data in the filesystem (
*.{png,tif}
/*.gt.txt
pairs underdata/NAME-ground-truth
below the tesstrain code). But often one needs more flexibility, for example becauseThere's a simple technique usually employed to that end: symlinks!
However, (GNU and BSD)
find
by default does not follow directories which are merely symlinks. This PR adds the-L
option so that all*.gt.txt
files are found below any such (sub)paths.