weinman / cnn_lstm_ctc_ocr

Tensorflow-based CNN+LSTM trained with CTC-loss for OCR
GNU General Public License v3.0
497 stars 170 forks source link

model question #32

Closed wkhunter closed 6 years ago

wkhunter commented 6 years ago

Layer Op KrnSz Stride(v,h) OutDim H W PadOpt 1 Conv 3 1 64 30 30 valid 2 Conv 3 1 64 30 30 same Pool 2 2 64 15 15
3 Conv 3 1 128 15 15 same 4 Conv 3 1 128 15 15 same Pool 2 2,1 128 7 14
5 Conv 3 1 256 7 14 same 6 Conv 3 1 256 7 14 same Pool 2 2,1 256 3 13
7 Conv 3 1 512 3 13 same 8 Conv 3 1 512 3 13 same Pool 3 3,1 512 1 13
9 LSTM 512
10 LSTM 512

if I want to train more than 3000+ chars, how to modify the model. cnn layer more deeper, change to maxpooling layer or what?

weinman commented 6 years ago

The number of hidden layers may impact the performance of a model with a large number of output characters. However, strictly speaking it is only the value of the parameter num_classes to model.rnn_layers that needs to be changed to control the output layer's dimensionality.

wkhunter commented 6 years ago

OK, I changed model-crnn to Resnet and problem solved!