microsoft / SpeechT5

Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
MIT License
1.09k stars 113 forks source link

SpeechT5-tts fine-tuned on Chinese #49

Open qlmbeck opened 1 year ago

qlmbeck commented 1 year ago

I used colab notebookto fine-tuned this model.When I run trainer.train(),It goes into error.

in <cell line: 2>:2                                                                              │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/transformers/trainer.py:1662 in train                     │
│                                                                                                  │
│   1659 │   │   inner_training_loop = find_executable_batch_size(                                 │
│   1660 │   │   │   self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size  │
│   1661 │   │   )                                                                                 │
│ ❱ 1662 │   │   return inner_training_loop(                                                       │
│   1663 │   │   │   args=args,                                                                    │
│   1664 │   │   │   resume_from_checkpoint=resume_from_checkpoint,                                │
│   1665 │   │   │   trial=trial,                                                                  │
│                                                                                                  │
│ /usr/local/lib/python3.9/dist-packages/transformers/trainer.py:1839 in _inner_training_loop      │
│                                                                                                  │
│   1836 │   │   self.state.is_world_process_zero = self.is_world_process_zero()                   │
│   1837 │   │                                                                                     │
│   1838 │   │   # tr_loss is a tensor to avoid synchronization of TPUs through .item()            │
│ ❱ 1839 │   │   tr_loss = torch.tensor(0.0).to(args.device)                                       │
│   1840 │   │   # _total_loss_scalar is updated everytime .item() has to be called on tr_loss an  │
│   1841 │   │   self._total_loss_scalar = 0.0                                                     │
│   1842 │   │   self._globalstep_last_logged = self.state.global_step                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be 
incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

I do use GPU,why did this error happen?

fkwlqm commented 1 year ago

For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Srija616 commented 1 year ago

Hi! I am trying to fine-tune SpeechT5 for Hindi. From my understanding, SpeechT5 is pre-trained on LibriSpeech English dataset and on going through the official colab notebook for fine-tuning the TTS model for Dutch, I realised that they are replacing the characters not in the vocabulary of their tokenizer. I have two doubts, which tokenizer are they using and since you are fine-tuning for Chinese, how did you deal with the problem of chinese characters not part of their vocabulary?

mechanicalsea commented 1 year ago

Fine-tuning on the other language as https://github.com/microsoft/SpeechT5/issues/56#issuecomment-1624912143

indiejoseph commented 4 months ago

You must expand the vocab of the processor tokenizer first, otherwise it would tokenzie the whole chunk of the sentence as the input.

Screenshot 2024-02-20 at 11 32 25 AM