tesseract-ocr / langdata

Source training data for Tesseract for lots of languages
Apache License 2.0
834 stars 888 forks source link

Seven Segment Display fonts #65

Open Shreeshrii opened 7 years ago

Shreeshrii commented 7 years ago

copied from https://github.com/tesseract-ocr/langdata/issues/59#issuecomment-290533931

@Shreeshrii 4. Traineddata for Seven Segment (or 14 segment) Display

@theraysmith Beyond the scope of this change.

Shreeshrii commented 7 years ago

https://www.unix-ag.uni-kl.de/~auerswal/ssocr/

Seven Segment Optical Character Recognition or ssocr for short is a program to recognize digits of a seven segment display. It is probably a better bet than tesseract for this.

https://github.com/tesseract-ocr/tesseract/wiki/AddOns lists 7 segments font: This is for tesseract 3.0x

Shreeshrii commented 7 years ago

copied from https://github.com/tesseract-ocr/langdata/issues/59#issuecomment-290533931

comment by @shreeshrii

I am trying to do finetune training for seven segment display using eng.traineddata as the base and training text in about 10 SSD fonts with numbers and CAPITAL letters.

Is that the recommended strategy or would replacing a layer give better results? Finetune goes through lstmf files sequentially - see LSTM: Finetune training does not mix fonts #795

Also, should any kind of wordlist/dictionary be included for what maybe random combinations of letters and numbers?

TungXuan commented 7 years ago

@Shreeshrii How I can use for Seven Segment Display font?

Shreeshrii commented 7 years ago

I am not sure. I don't think that LSTM engine is suitable for handling this.