watsonyanghx / CNN_LSTM_CTC_Tensorflow

CNN+LSTM+CTC based OCR implemented using tensorflow.
MIT License
362 stars 210 forks source link

Cost continuing reduce while accuracy is always ZERO #26

Open wangke1003 opened 6 years ago

wangke1003 commented 6 years ago

Hello, everyone: I run this script with author's dataset well, but i get into into trouble like title when i train the model with my own dataset. 333 some pics of my dataset: 1000072_13 169 121 122 123 10 11 12 149 150 53 84 151 152 66 67 68 69 50 40 39 43 45 51 46 1000060_168 169 13 14 15 21 25 169 170 20 13 169 171 54 172 173 22 52 53 54 55 36 20 13 13 1000018_61 62 63 29 64 65 53 66 67 68 69 121 122 123 10 11 12 176 177 22 112 13 20 56 115 1000016_172 173 22 52 53 54 55 36 20 13 13 20 174 174 70 56 18 153 154 155 156 175 158 65 53 these pics are 30x500, 25 chars in each pic. i used about 260k of these to train, 65k to validate. words in pics are randomly selected from some drug infos like this:
222 with open('thistxt', 'r', encoding='utf-8') as f: # read each line into a list all_lines = f.read().split('\n').strip() # link each line to a string data_str = ''.join(all_lines) # generate word with random index rand_word = data_str[a_rand_num, a_rand_num + word_length] there are 196 unique chars in this txt, so my num_classes in the model is 196. is my dataset not large enough or what? i'd appreciate if anyone can help. 中文也可以