watsonyanghx / CNN_LSTM_CTC_Tensorflow

CNN+LSTM+CTC based OCR implemented using tensorflow.
MIT License
362 stars 210 forks source link

Change the image width and height #5

Closed wushilian closed 6 years ago

wushilian commented 6 years ago

Hello,I chang the Image width and height from(60,180)to(80,500),then I get an error:

InvalidArgumentError (see above for traceback): Matrix size-incompatible: In[0]: [40,288], In[1]: [176,512] [[Node: lstm/rnn/while/multi_rnn_cell/cell_0/lstm_cell/lstm_cell/MatMul = MatMul[T=DT_FLOAT, transpose_a=false, transpose_b=false, _device="/job:localhost/replica:0/task:0/gpu:0"](lstm/rnn/while/multi_rnn_cell/cell_0/lstm_cell/lstm_cell/concat, lstm/rnn/while/multi_rnn_cell/cell_0/lstm_cell/lstm_cell/MatMul/Enter)]] [[Node: Mean/_37 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_950_Mean", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Is there anything else I should change to fix this error?

watsonyanghx commented 6 years ago

For now, the netword architecture is designed for image of size (60,180), so you have 2 choices now:

  1. Change back to (60,180).
  2. Or you can change the filter size of CNN to fit (80,500).
980044579 commented 6 years ago
with tf.variable_scope('lstm'):
            # [batch_size, max_stepsize, num_features]
            x = tf.reshape(x, [FLAGS.batch_size, -1, filters[3]])
            x = tf.transpose(x, [0, 2, 1])  # batch_size * 64 * 48
            #shp = x.get_shape().as_list()
            #x.set_shape([FLAGS.batch_size, filters[3], shp[1]])
            x.set_shape([FLAGS.batch_size, filters[3], 48])

please note the number "48",that is the number of features from CNN part. there are four steps in CNN part,please note the shape of x: input(60,180)->x1(batch_size,30,90,channel)->x2(batch_size,15,45,channel)->x3(batch_size,8,23,channel)->x4(batch_size,4,12,channel)
so as you can see,when the input size is (60,180),features feed into lstm part should be 412=48. That is same to your situation,when the input is (80,500) , the feature number is 532=160 you just need to replace the number 48 with 160 in the code