tesseract-ocr / tesstrain

Train Tesseract LSTM with make
Apache License 2.0
630 stars 184 forks source link

Add makefile based training text and font to model scripts #238

Closed Shreeshrii closed 2 years ago

Shreeshrii commented 3 years ago

Builds a model using synthetic training data generated from

A helper bash script is provided to ensure that all required values are passed to makefile.

The box/tiff pairs are saved for the evaluation data and gt.txt is generated from the box files.

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

bertsky commented 3 years ago

Very interesting!

Could we rework this so that we end up with a minimal set of changes to the base Makefile instead? (That would avoid the need for synchronization of the two variants in the future, plus we could try add the possibility to mix synthetic and GT training.)

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

bertsky commented 3 years ago

These stale .github rules are a joke. Auto-close of issues and PRs? In this kind of repo?

What was the rationale behind this, @wrznr?

wrznr commented 3 years ago

The rational was to prevent user questions which require detailed information from the reporter which often never come from being open issues forever. We may have to adjust the rules for a more realistic behavior.

bertsky commented 3 years ago

The rational was to prevent user questions which require detailed information from the reporter which often never come from being open issues forever. We may have to adjust the rules for a more realistic behavior.

I see. But that problem can be addressed directly/manually IMV: just add a comment with a friendly reminder ("or can we close?"). This can be done when and if maintainers/contributers here do have the time. Right now it's the opposite: we are forced to tend to the machinery within so and so many days, or otherwise things will disappear from the radar. (And assigning labels is also effort, esp. if you don't recall what labels are needed for what kind of promotion.)