microsoft / SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
MIT License
1.09k stars 113 forks source link

Reproduce ASR experiment results in Hugging Face #59

Closed jjyaoao closed 11 months ago

jjyaoao commented 11 months ago

I try to fine tune SpeechT5-base on clean-train-100 dataset using Transformer library The problem I am encountering now is that my verification set is too good when it is other-test (wer=2.76), which makes me suspect that there may be problems with the model or method I use, but the effect of clean-test is the same as The papers are similar.

Here is the model I fine-tuned Can you help me see if there is a serious problem with the experimental part? Thanks♪(・ω・)ノ