janzd / CRNN

Convolutional recurrent neural network for scene text recognition or OCR in Keras
MIT License
122 stars 33 forks source link

Recognizing spaces and special characters #3

Closed himanshurawlani closed 5 years ago

himanshurawlani commented 5 years ago

Firstly, thank you very much for sharing this repository. The provided model works perfectly for individual words however it cannot recognize spaces, special characters, and punctuations. Please, can you help me out with this?

  1. What changes shall I make to the model?
  2. Which dataset would you suggest?
janzd commented 5 years ago

Thank you for trying out the code and the model.

  1. You have to add new characters. The default characters are defined config.py. Mind that the hyphen there stands for a non-character, so if you'd like to be able to recognize hyphens too, you have to use something else as a non-character symbol. Recognizing spaces should be relatively easy if the model is trained for that (as long as the spaces are not too narrow). Recognizing punctuation can be a bit tricky - a lot of punctuation marks are much smaller than regular characters which might make them go unnoticed.

  2. Most widely used datasets for scene text recognition do not contain many punctuation marks. You might have to generate some synthetic data like the Synth90k dataset that I used for training, but with all the characters that you need.