microsoft / SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
MIT License
1.09k stars 113 forks source link

ASR SpeechT5 training - model predicts same output for different inputs #62

Open L7uan opened 9 months ago

L7uan commented 9 months ago

Hi! I am currently trying to train a SpeechT5forSpeechToText model for an ASR task from scratch. My traing goes quite well most of the time, however when i try to use the model for inference with model.generate(**input) the predicts the same output for different inputs... I'm using the huggingface implementation and I followed every step on how to train the model but I just cant find the error in my code, why my model predicts the same output for every input... Might this be a general error with the SpeechT5ForSpeechToText implementation on huggingface? Or am I doing anything wrong?? Any fast help would be really appreceated!