myshell-ai / MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.
MIT License
3.97k stars 473 forks source link

Error training new language: The expanded size of the tensor (768) must match the existing size (1024) at non-singleton dimension 0. Target sizes: [768, 19]. Tensor sizes: [1024, 19] #142

Open andretocalivros opened 1 month ago

andretocalivros commented 1 month ago

I am trying to train a new language model (Portuguese) but I am encountering the error "The expanded size of the tensor (768) must match the existing size (1024) at non-singleton dimension 0. Target sizes: [768, 19]. Tensor sizes: [1024, 19]" during the training phase.

Initially, I created a new language (based on Spanish) in the "text" directory, performed the preprocessing, and the BERT and config files were generated successfully. However, when I attempt to train the model, the above error is presented.

For the tokenizer, I used the 'neuralmind/bert-large-portuguese-cased' model, and I am unsure if this might be the problem. The audio files are all in wav format and up to 10 seconds long. Could you please guide me on how to fix this error? I intend to contribute to the training code for Portuguese once I achieve success.

Thank you for your assistance.

jeremy110 commented 4 weeks ago

Did you modify data_utils.py to add your language labels? Also, 768 and 1024 are BERT embeddings and are related to your BERT model.

andretocalivros commented 4 weeks ago

Did you modify data_utils.py to add your language labels? Also, 768 and 1024 are BERT embeddings and are related to your BERT model.

Hi Jeremy! Yes, I have modified the data_utils, the problem was only this error when start training. Thanks for the explanation about the BERT model, I will try to find one that can be used with MeloTTS training.