tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
599 stars 178 forks source link

Missing config in new created traineddata #310

Open MPQC opened 2 years ago

MPQC commented 2 years ago

Hi. I'm trying to take the jpn_vert traineddata, and further train it with my own images. My command to run it looks like this:

make training MODEL_NAME=my-custom-model START_MODEL=jpn_vert TESSDATA=$TESSDATA

This works well, but if I run the following to create some traineddata while it's currently running:

make traineddata CHECKPOINT_FILES="$(ls -t data/my-custom-model/checkpoints/*.checkpoint | head -2)" MODEL_NAME=my-custom-model START_MODEL=jpn_vert TESSDATA=$TESSDATA

I would have expected it to use the same jpn_vert.config from the jpn_vert trainneddata and included it in the resulting model, but it doesn't have it. Is this expected?

stweil commented 2 years ago

Yes, it is currently implemented like that. There is no implementation which copies the configuration (and other components, for example the dictionary) from the original traineddata to the newly trained file(s).