s-omranpour / Pytorch-Speech-Recognition

A simple implementation of the paper https://arxiv.org/pdf/1910.00716v1.pdf
GNU General Public License v3.0
31 stars 11 forks source link

AssertionError #4

Open mcggood opened 4 years ago

mcggood commented 4 years ago

Hi I can't run with Librispeech data. I preprocess as you said: audio_path and sentence in dataFrame.

audio_path \ 0 /data01/HH_home/Speech_Recognition/Convolution...
1 /data01/HH_home/Speech_Recognition/Convolution...
sentence
0 CHAPTER SIXTEEN I MIGHT HAVE TOLD YOU OF THE B...
1 MARGUERITE TO BE UNABLE TO LIVE APART FROM ME ...

and alphabet: ['a','b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', ' ']

I got error: ConvolutionalSpeechRecognition/Module/utils.py ---> 38 assert log_probs.shape[2] >= y.shape[1] I print these, log_probs.shape[2] is 128 and y.shape[1] is 321. How can I fix this issue?

s-omranpour commented 4 years ago

Hi, as I have commented in the code log_probs dimensions are (batch_size, num_classes, output_len) and the "y" tensor's dimensions are also (batch_size, target_len). output_len and target_len indicate the time steps of the output data and the ground_truth data respectively. One key condition for calculating ctc_loss is that the output_len should be greater than or equal with target_len. As you mentioned there is a problem with your output dimensions. Please print "log_probs.shape" and "x.shape" to find the problem.

hieuhv94 commented 4 years ago

I have a similar error Can u fix it? @mcggood

mcggood commented 4 years ago

@hieuhv94 No. what s-omranpour answered won't help. I already print their shape _logprobs.shape[2] is 128 and y.shape[1] is 321. It shall be a bug. if my alphabet is okey.

hieuhv94 commented 4 years ago

Hi @s-omranpour Do u have anymore idea to fix it? I really need it

Best regards

s-omranpour commented 4 years ago

Hi @hieuhv94 , Can you print the whole shape of x and log_probs?

hieuhv94 commented 4 years ago

I commented assert command and print shape of x, y, and log_probs after some loops. It see here:

log_probs shape torch.Size([64, 28, 131])
x shape torch.Size([64, 64, 1310])
y shape torch.Size([64, 306])

log_probs shape torch.Size([64, 28, 134])
x shape torch.Size([64, 64, 1346])
y shape torch.Size([64, 300])

log_probs shape torch.Size([64, 28, 134]) x shape torch.Size([64, 64, 1344])
y shape torch.Size([64, 280])

log_probs shape torch.Size([64, 28, 134]) x shape torch.Size([64, 64, 1344])
y shape torch.Size([64, 278])

log_probs shape torch.Size([64, 28, 136]) x shape torch.Size([64, 64, 1364])
y shape torch.Size([64, 297])

log_probs shape torch.Size([64, 28, 134])
x shape torch.Size([64, 64, 1344])
y shape torch.Size([64, 290])

log_probs shape torch.Size([64, 28, 136]) x shape torch.Size([64, 64, 1365])
y shape torch.Size([64, 294])

s-omranpour commented 4 years ago

So, the "log_probs" which are your model outputs, have lengths lower than your "y" which is the ground truth. This problem is due to the value of the strides used in the convolution layers. As you can see in the "train.py" the strides is set to [5,2,1] which makes the output of the model having length about 0.1 of the input length. your printed shapes approves this claim. So to fix this error you need to decrease strides. as I can see min(x.shape[2]/y.shape[1]) is almost 4 (which is kind of wierd to me) the product of all your stride values should be 4. you can use strides=[4,1,1] or strides=[2,2,1] and see whether it raises error again or not.

hieuhv94 commented 4 years ago

Thanks @s-omranpour! I'll try it