nDims, nClasses, image transposition, etc.

rakeshvar / rnn_ctc

Recurrent Neural Network and Long Short Term Memory (LSTM) with Connectionist Temporal Classification implemented in Theano. Includes a Toy training example.

Apache License 2.0

220 stars 80 forks source link

nDims, nClasses, image transposition, etc. #9

Closed cxf739 closed 8 years ago

cxf739 commented 8 years ago

rakeshvar commented 8 years ago

It is the height of an image. Should be same for all images. The length/width of the image can be arbitrary.

rakeshvar commented 8 years ago

Your number of classes is wrong. Number of classes should be the value of the highest class label + 1. In your case, it should at least be 115 + 1.

On Sun, Jan 10, 2016 at 7:46 PM, cxf739 notifications@github.com wrote:

Now, I am ready to train a model with my data, I meet a bug ,shown in the image. I have checked my data. In data format, it is the same as the example.

Do you know the reason? @rakeshvar https://github.com/rakeshvar thx. [image: 20160111112006] https://cloud.githubusercontent.com/assets/8044844/12226390/35dc9472-b856-11e5-8455-1c9b22087cbf.jpg

— Reply to this email directly or view it on GitHub https://github.com/rakeshvar/rnn_ctc/issues/9#issuecomment-170425732.

cxf739 commented 8 years ago

Thanks for your reply. Today, I study the code of ctc, another question comes into being. In file CTC.py, line105. I did a test base hindu data. suppose D is the inpt, 11 * 32, nClassNum=11=10 + 1, DD is the transpose result. input labels is [2,2,3,4,5]. which one is right?
A: select the probability of one moment of all the label corresponding to labels , D[:,labels] B: select the probability of all moment in labels DD[:,labels]

My understanding is B, but the hindu training result is wrong, I don't kown why. Could you give me an example : the calculation process of ctc.
I just kown the theory of ctc. 20160113165308

rakeshvar commented 8 years ago

Yes, there is some confusion because of a lot of transposes happening.

In neuralnet.py you will see that the input image is transposed as image.T from h x l slab to l x h scroll. h - fixed height, same for all samples. l - variable length, different for each image.

        layer1 = midlayer(image.T, n_dims, **midlayer_args)
        layer2 = SoftmaxLayer(layer1.output, layer1.nout, n_classes + 1)
        layer3 = CTCLayer(layer2.output, labels, n_classes, logspace)

So in ctc.py, log_pred_y = tt.log(self.inpt[:, self.labels]) will mean that for all times, pick the true labels' probabilities.

rakeshvar commented 8 years ago

@cxf739 Please do not remove your posts with images. They will be helpful for others. Thanks for asking these questions.

cxf739 commented 8 years ago

Ok, I will not remove any posts.

In ctc.py, log_pred_y = tt.log(self.inpt[:, self.labels]), you mean self.inpt is l * h ? In train.py, pred, aux = ntwk.tester(x). pred is the layer2.output In neuralnet.py, layer3 = CTCLayer(layer2.output, labels, n_classes, logspace). ctc input is layer2.output,too. Is it same as pred in size ? I print pred, it is h * l

rakeshvar commented 8 years ago

That is because tester returns layer2.output.T. So it is being transposed back to h x K. Here K is the number of classes nClasses.

        self.tester = th.function(
            inputs=[image],
            outputs=[layer2.output.T, layer1.output.T], )

cxf739 commented 8 years ago

Thank you!