Open qlmbeck opened 1 year ago
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Hi! I am trying to fine-tune SpeechT5 for Hindi. From my understanding, SpeechT5 is pre-trained on LibriSpeech English dataset and on going through the official colab notebook for fine-tuning the TTS model for Dutch, I realised that they are replacing the characters not in the vocabulary of their tokenizer. I have two doubts, which tokenizer are they using and since you are fine-tuning for Chinese, how did you deal with the problem of chinese characters not part of their vocabulary?
Fine-tuning on the other language as https://github.com/microsoft/SpeechT5/issues/56#issuecomment-1624912143
You must expand the vocab of the processor tokenizer first, otherwise it would tokenzie the whole chunk of the sentence as the input.
I used colab notebookto fine-tuned this model.When I run trainer.train(),It goes into error.
I do use GPU,why did this error happen?