ilovin / lstm_ctc_ocr

Use CTC + tensorflow to OCR
https://ilovin.github.io/2017-04-06/tensorflow-lstm-ctc-ocr/
354 stars 140 forks source link

there are no space “ ” in the labels,why add a space class in num_classes? #35

Closed jacobunderlinebenseal closed 6 years ago

jacobunderlinebenseal commented 6 years ago

26*2 + 10 digit + blank + space

num_classes=26+26+10+1+1

i find no spaces in file names (00015901_mTiG.png 00031901_ZqJml.png 00047901_VMYAF.png), why add it? am i looking wrong?

ilovin commented 6 years ago

edit: I just check the code, the comment has already been removed from the beta version.

yes, there is no space.

At first, I think there may exist space label, so I just add one. But when implementing the CTC(standard CTC & warpCTC), the former one requires the blank label to be N+1, the latter one requires the blank label to be zero. So the 'space' is preserved. there may exist misleading in the comment, I'll fix that. the encode map is actually "blank (26*2+10) blank", only one works for each version of code.

jacobunderlinebenseal commented 6 years ago

sorry i still dont understand LOL But when implementing the CTC(standard CTC & warpCTC), the former one requires the blank label to be N+1 the encode map is actually "(262+10)", only one works for each version of code. so it should be 262+10+1?

Notes:

This class performs the softmax operation for you, so inputs should
be e.g. linear projections of outputs by an LSTM.

The `inputs` Tensor's innermost dimension size, `num_classes`, represents
`num_labels + 1` classes, where num_labels is the number of true labels, and
the largest value `(num_classes - 1)` is reserved for the blank label.

For example, for a vocabulary containing 3 labels `[a, b, c]`,
`num_classes = 4` and the labels indexing is `{a: 0, b: 1, c: 2, blank: 3}`.

help(tf.nn.ctc_loss) says

ilovin commented 6 years ago

for standard,encode map : 62+blank for warp,encode map: blank+62

but both,encode map: blank+62+blank in this way, you do not have to change the encode map for two version. it is only a trick for convenience. you can change the nclass to n+1if you like, but don't forget to change the encode map.

jacobunderlinebenseal commented 6 years ago

got it, thx very much