sooftware / conformer

[Unofficial] PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)
Apache License 2.0
910 stars 173 forks source link

About the "input length mismatch" bug in torchaudio's RNNT loss #37

Open Zain-Jiang opened 2 years ago

Zain-Jiang commented 2 years ago

In conformer/convolution.py, line 183, the code

output_lengths = input_lengths >> 2
output_lengths -= 1

when the result of input_lengths >>2 is xx.75, the torchaudio.transforms.RNNTLoss will raise "input length mismatch" Error. Maybe this is a bug when calculating the output lengths, I'm not sure of it.

longlnOff commented 7 months ago

I've gotten same problem. Have you fixed that yet?

zzzendurance commented 4 weeks ago

I've gotten same problem. Have you fixed that yet?

I'm sorry to bother you. I would like to use the lengths of conformer, but I'm not sure what the lengths of input_lengths should be.

The following paragraph is a description of the parameters in the source code, does it mean that I input a one-dimensional tensor (that is, batch)? But then I saw in his example that the input was a three-dimensional tensor. How did you enter the parameters when you used the conformer?

image