Closed cxf739 closed 8 years ago
It is the height of an image. Should be same for all images. The length/width of the image can be arbitrary.
Your number of classes is wrong. Number of classes should be the value of the highest class label + 1. In your case, it should at least be 115 + 1.
On Sun, Jan 10, 2016 at 7:46 PM, cxf739 notifications@github.com wrote:
Now, I am ready to train a model with my data, I meet a bug ,shown in the image. I have checked my data. In data format, it is the same as the example.
Do you know the reason? @rakeshvar https://github.com/rakeshvar thx. [image: 20160111112006] https://cloud.githubusercontent.com/assets/8044844/12226390/35dc9472-b856-11e5-8455-1c9b22087cbf.jpg
— Reply to this email directly or view it on GitHub https://github.com/rakeshvar/rnn_ctc/issues/9#issuecomment-170425732.
Thanks for your reply.
Today, I study the code of ctc, another question comes into being.
In file CTC.py, line105.
I did a test base hindu data.
suppose D is the inpt, 11 * 32, nClassNum=11=10 + 1,
DD is the transpose result.
input labels is [2,2,3,4,5].
which one is right?
A: select the probability of one moment of all the label corresponding to labels , D[:,labels]
B: select the probability of all moment in labels DD[:,labels]
My understanding is B, but the hindu training result is wrong, I don't kown why.
Could you give me an example : the calculation process of ctc.
I just kown the theory of ctc.
Yes, there is some confusion because of a lot of transposes happening.
In neuralnet.py
you will see that the input image is transposed as image.T
from h x l
slab to l x h
scroll.
h
- fixed height, same for all samples.
l
- variable length, different for each image.
layer1 = midlayer(image.T, n_dims, **midlayer_args)
layer2 = SoftmaxLayer(layer1.output, layer1.nout, n_classes + 1)
layer3 = CTCLayer(layer2.output, labels, n_classes, logspace)
So in ctc.py
, log_pred_y = tt.log(self.inpt[:, self.labels])
will mean that for all times, pick the true labels' probabilities.
@cxf739 Please do not remove your posts with images. They will be helpful for others. Thanks for asking these questions.
Ok, I will not remove any posts.
In ctc.py, log_pred_y = tt.log(self.inpt[:, self.labels]), you mean self.inpt is l * h ? In train.py, pred, aux = ntwk.tester(x). pred is the layer2.output In neuralnet.py, layer3 = CTCLayer(layer2.output, labels, n_classes, logspace). ctc input is layer2.output,too. Is it same as pred in size ? I print pred, it is h * l
That is because tester returns layer2.output.T
. So it is being transposed back to h x K
. Here K
is the number of classes nClasses
.
self.tester = th.function(
inputs=[image],
outputs=[layer2.output.T, layer1.output.T], )
Thank you!
I