meijieru / crnn.pytorch

Convolutional recurrent network in pytorch
MIT License
2.38k stars 658 forks source link

why self.alphabet = alphabet + '-' ? #205

Open gbolin opened 5 years ago

gbolin commented 5 years ago

hi friend, I have a doubt in the file crnn.pytorch/utils.py + 25 why do you code like this ?

self.alphabet = alphabet + '-' # for -1 index I really could not understand why add '-' at the tail of alphabet. looking forward to hearing from you

IEEE-FELLOW commented 5 years ago

@GitHubGS In text recognition,we use '-' to represent chars not in our alphabet,just like in object detection task,we ues num +1 class ,in which 1 represent background....

gbolin commented 5 years ago

@IEEE-FELLOW but, the self.alphabet only used in function decode, `
if raw: return ''.join([self.alphabet[i - 1] for i in t]) else: char_list = [] for i in range(length): if t[i] != 0 and (not (i > 0 and t[i - 1] == t[i])): char_list.append(self.alphabet[t[i] - 1])

`

I wonder that what is the max value of t[i] - 1? bcz I think the max value of (t[i]-1) is len(alphabet) because t[i] max value is len(alphabet) + 1.

tabsun commented 4 years ago

@GitHubGS When t[i] == 0, self.alphabet[0-1] is the last one in alphabet. It means a separator between characters. Actually the added '-' only make sense in decode() and it wont appear in training.