wushilian / STN_CNN_LSTM_CTC_TensorFlow

use STN+CNN+BLSTM+CTC to do OCR
92 stars 27 forks source link

about the STN loss #6

Open Fangction opened 5 years ago

Fangction commented 5 years ago

Is there any STN loss to constraint the transformation angle?I didn't find it in your code. If there is no one, how do the net know the how to transfer a picture?

wushilian commented 5 years ago

@Fangction There is no stn loss,stn was supervised by ctc loss.However,if you have more supersied information,you can add constraint.For example,monitoring the coordinates regressed by the STN.

Fangction commented 5 years ago

@wushilian thank you. And I have another question. In your code W = tf.Variable(tf.zeros([128, 20])) b = tf.Variable(initial_value=[-1, -0.2, -0.5, -0.35, 0, -0.5, 0.5, -0.67, 1, -0.8, -1, 0.8, -0.5, 0.65, 0, 0.5, 0.5, 0.33, 1, 0.2], dtype=tf.float32)

fc3_loc=tf.layers.dense(fc2_loc,20,activation=tf.nn.tanh,kernel_initializer=tf.zeros_initializer)

            # fc3_loc = slim.fully_connected(fc2_loc, 8, activation_fn=tf.nn.tanh, scope='fc3_loc')
            # spatial transformer
            fc3_loc = tf.nn.tanh(tf.matmul(fc2_loc, W) + b)#激活函数结果
            loc = tf.reshape(fc3_loc, [-1, 10, 2])#将fc3_loc的结果按照-1 10 2的结构reshape
            # spatial transformer
            s = np.array([[-0.95, -0.95], [-0.5, -0.95], [0, -0.95], [0.5, -0.95], [0.95, -0.95], [-0.95, 0.95], [-0.5, 0.95], [0, 0.95], [0.5, 0.95],
                          [0.95,0.95]] * 256)
            s = tf.constant(s.reshape([256, 10, 2]), dtype=tf.float32)

how did you decide the value of variable b and s?

wushilian commented 5 years ago

@Fangction b means Specific initialization,for details,you need to read the paper:Robust Scene Text Recognition with Automatic Rectification

Fangction commented 5 years ago

@wushilian thank you again. And I also want to ask one more question. Do you think the STN's output image size must be fixed size?I see you define the output as image_width=120 image_height=32. Can i keep the size of output image same as the input image?

wushilian commented 5 years ago

@Fangction Yes,you can change the size.

daming98 commented 5 years ago

@Fangction Yes,you can change the size.

I change the size and it cause a error like "InvalidArgumentError (see above for traceback): len(seq_lens) != input.dims(0), (256 vs. 1536)"

What can I do to solve this error?