Closed yunigma closed 2 months ago
Hi, --hubert-label-dir
should be the transcription for ASR. So, it should not be the HuBERT labels extracted by kmeans model. You can just follow issue_15 and libri-label to prepare.
Hi, @Ajyy , thank you so much! I confirm that everything works fine now. I have run fine-tuning with the data.wrd
, the training loss looked good, and in the inference, I am getting a very close result to the one I get with the released model. Currently, I am running fine-tuning on the other data as well.
Hello and thank you very much for your project! I want to fine-tune the pre-trained SpeechT5 model to the ASR task with LibriSpeech data (after that I plan to fine-tune it on some other data). The fine-tuning runs for the set number of epochs (42) without throwing any errors but after the 18th epoch, the loss becomes zero (see the training logs). The fine-tuned model does not produce any meaningful hypotheses.
Here is the command I run (the train logs are also attached here fine-tune-log):
In how I run the fine-tune command, I am mostly unsure about the
--hubert-label-dir
argument. In the inference, it is just set to the transcriptions (as discussed here issue_15). For fine-tuning, I set it to the labels extracted from HuBERT. But in this case, I do not know if I do it correctly. For the HuBERT labels, I got two types of files:data.len
with the length number per utterance anddata.npy
containing features themselves. I set--hubert-label-dir
tolen
, as the script expected some txt input. Please, could you clarify for me these points and help me to fix the ASR fine-tuning? Thanks in advance!