Open mohame54 opened 1 year ago
Can you provide the stack trace and reproduction steps?
I want to train a transducer model for speech recognition task first , I extract the mel spectrogram from audio signal considering the spectrogram length in my training then I encode label using hugging face tokenizers also considering the label lengths, The encoding goes like this first I encode the label then I prepend the null token and finally padding the labels
Can you provide the stack trace and reproduction steps?
I could show you my colab if you want
I'd like to note that my encoder network is a Conformer network that do conv sampling before applying the attention mechanism hence I reduce the time axis or the audio feature length during training to avoid memory crashes
🐛 Describe the bug
def loss(self,audio_feat,feat_lens,target,target_lens): """ audio_feat: mel_spectrogram, feat_lens :mel_length before padding target: target_seq target_lens: target sequence length before padding """
Versions
I keep getting this error of input length mismatch and output length mismatch