qjadud1994 / CRNN-Keras

CRNN (CNN+RNN) for OCR using Keras / License Plate Recognition
MIT License
520 stars 191 forks source link

variable-length plate #6

Closed hoangdzung closed 4 years ago

hoangdzung commented 5 years ago

Can your model deal with variable-length plate? In Image_Generator.py, you write Y_data[i]= text_to_labels(text), that mean "text" must have the length of 9. What about 7 or 8 characters? Or this model just work with 9-character plate? Thank you.

qjadud1994 commented 5 years ago

You can not group 7 letters and 8 letters in the same batch.

Therefore, it is usually solved by adding padding.

If the max text length is 9, create a character with length 9 by adding padding before or after the 7-character length.

In addition, grouping texts with the same maximum number of characters in the same batch is efficient for learning.

soldierofhell commented 5 years ago

@qjadud1994, what do you mean by "adding padding"?

  1. Add spaces (" ")? Then you have to add it to the "letters"? Like .ljust(self.max_text_len)
  2. Add class 36? Like .extend((self.max_text_len-len(text))*[36])

And there's a question where to add it, left or right?

qjadud1994 commented 5 years ago

padding can be any character such as space or *. However, the character specified by padding should not affect the prediction.

It does not matter where you put the padding to the left or right of the character.

ex) max_len = 5 [B, Y, E, , ] [H, I, , , *]

hmunshi commented 5 years ago

Do we need to add the character "*" (used for padding) to the list of characters? Because let's say my plate number is XYZ1234*.jpg it says character is not in the list.

And, if I add it, the accuracy is 0.

xinyuegtxy commented 5 years ago

@qjadud1994 @soldierofhell @hmunshi I wonder if it will affect the performence of crnn if you add the "*" after a normal label?

tuanphan09 commented 5 years ago

The best method is to use 'blank' symbol for padding, like text_to_labels function in this code: https://github.com/tuanphan09/captcha-recognition/blob/master/data_gen.py

tuanphan09 commented 5 years ago

@xinyuegtxy if you add any character to the labels, it won't affect CRNN performence. I've already done that, but it will make your model a littel bit bigger (number of character increase 1). So the best way is to use 'blank' as i said above and it also makes sence.