Open mcggood opened 4 years ago
Hi, as I have commented in the code log_probs dimensions are (batch_size, num_classes, output_len) and the "y" tensor's dimensions are also (batch_size, target_len). output_len and target_len indicate the time steps of the output data and the ground_truth data respectively. One key condition for calculating ctc_loss is that the output_len should be greater than or equal with target_len. As you mentioned there is a problem with your output dimensions. Please print "log_probs.shape" and "x.shape" to find the problem.
I have a similar error Can u fix it? @mcggood
@hieuhv94 No. what s-omranpour answered won't help. I already print their shape _logprobs.shape[2] is 128 and y.shape[1] is 321. It shall be a bug. if my alphabet is okey.
Hi @s-omranpour Do u have anymore idea to fix it? I really need it
Best regards
Hi @hieuhv94 , Can you print the whole shape of x and log_probs?
I commented assert command and print shape of x, y, and log_probs after some loops. It see here:
log_probs shape torch.Size([64, 28, 131])
x shape torch.Size([64, 64, 1310])
y shape torch.Size([64, 306])
log_probs shape torch.Size([64, 28, 134])
x shape torch.Size([64, 64, 1346])
y shape torch.Size([64, 300])
log_probs shape torch.Size([64, 28, 134])
x shape torch.Size([64, 64, 1344])
y shape torch.Size([64, 280])
log_probs shape torch.Size([64, 28, 134])
x shape torch.Size([64, 64, 1344])
y shape torch.Size([64, 278])
log_probs shape torch.Size([64, 28, 136])
x shape torch.Size([64, 64, 1364])
y shape torch.Size([64, 297])
log_probs shape torch.Size([64, 28, 134])
x shape torch.Size([64, 64, 1344])
y shape torch.Size([64, 290])
log_probs shape torch.Size([64, 28, 136])
x shape torch.Size([64, 64, 1365])
y shape torch.Size([64, 294])
So, the "log_probs" which are your model outputs, have lengths lower than your "y" which is the ground truth. This problem is due to the value of the strides used in the convolution layers. As you can see in the "train.py" the strides is set to [5,2,1] which makes the output of the model having length about 0.1 of the input length. your printed shapes approves this claim. So to fix this error you need to decrease strides. as I can see min(x.shape[2]/y.shape[1]) is almost 4 (which is kind of wierd to me) the product of all your stride values should be 4. you can use strides=[4,1,1] or strides=[2,2,1] and see whether it raises error again or not.
Thanks @s-omranpour! I'll try it
Hi I can't run with Librispeech data. I preprocess as you said: audio_path and sentence in dataFrame.
audio_path \ 0 /data01/HH_home/Speech_Recognition/Convolution...
1 /data01/HH_home/Speech_Recognition/Convolution...
sentence
0 CHAPTER SIXTEEN I MIGHT HAVE TOLD YOU OF THE B...
1 MARGUERITE TO BE UNABLE TO LIVE APART FROM ME ...
and alphabet: ['a','b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', ' ']
I got error: ConvolutionalSpeechRecognition/Module/utils.py ---> 38 assert log_probs.shape[2] >= y.shape[1] I print these, log_probs.shape[2] is 128 and y.shape[1] is 321. How can I fix this issue?