Closed lpierron closed 3 years ago
Hi ! I trained a model with those french cleaners and my model doesn't struggle to say the sentence you wrote. (Un ongle de ma tante est incarné) Here is the link of the audio sample: https://sndup.net/3rcw
Hi ! I trained a model with those french cleaners and my model doesn't struggle to say the sentence you wrote. (Un ongle de ma tante est incarné) Here is the link of the audio sample: https://sndup.net/3rcw
Yes it's correct !!! Which release of Mozilla TTS have you used. I had to add the missing phonemes, and now I have to train again my model because it has 2 more phonemes. Ii it possible to have more information about your model:
config.json
file, then bin/train_*
you used, some prprocessing.config.json
?Could you try this sentence: "Les députés Républicains sont indépendants." !
My result : https://sndup.net/54gr
Thanks
This sentence sounds good to me as well: https://sndup.net/7ysm Here is my config file: https://drive.google.com/file/d/1MOOdw1kpQerKU2NjUYijNHSktAH75IqE/view?usp=sharing and the vocoder is wavegrad, shared on the wiki
Oh yes, it's very good. The major bug in french_cleaners
(replacing all 'm' in words by 'monsieur') doesn't seem to affect the training. The two missing nasal vowels doesn't affect your french synthesizer.
That's indeed a huge problem, I don't even know why I didn't notice it. I may have copied the english abreviations without extensive testing afterward. At any rate, it prooves that the model is resilient to perturbations ^^
I see you have set the sample rate at 22kHz, but our MAILabs corpus is 16kHz, did you interpolate the datas for upscaling to 22kHz ? I'm training with ezwa from MaiLabs.
I trained using the full mailbas dataset (all speakers) and yeah I resampled the dataset to 22kHz
I trained using the full mailbas dataset (all speakers) and yeah I resampled the dataset to 22kHz
I'm using your config.json
file, I upscaled the waves files from MAILabs at 22050 Hz, and I cannot train above 8k steps, when I was at 16k the problem occurred at 15k steps.`
You can see my tensorboard: http://tts.lpcomprise.eu/#scalars
And my job stops with an error:
Traceback (most recent call last):
File "../TTS/TTS/bin/train_tacotron.py", line 721, in <module>
main(args)
File "../TTS/TTS/bin/train_tacotron.py", line 619, in main
train_avg_loss_dict, global_step = train(train_loader, model,
File "../TTS/TTS/bin/train_tacotron.py", line 180, in train
loss_dict = criterion(postnet_output, decoder_output, mel_input,
File "/home/lpierron/anaconda2/envs/tts/lib/python3.8/site-packages/torch/nn/modules/module.py", lin
e 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lpierron/Mozilla_TTS/TTS/TTS/tts/layers/losses.py", line 398, in forward
raise RuntimeError(f" [!] NaN loss with {key}.")
RuntimeError: [!] NaN loss with decoder_loss.
Is it possible to have your models of French or are they private ? Just the part text2feat
, because I verified that the vocoder
part is perfect when you enter signal as 22k mels and pretty good with 16k mels.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discourse page for further help. https://discourse.mozilla.org/c/tts
I have seen the same problem in my Hakka(a Chinese dialect) constantly. This is my first training with my own datasets. See error message below:
Traceback (most recent call last):
File "TTS/TTS/bin/train_tacotron.py", line 721, in
File "TTS/TTS/bin/train_tacotron.py", line 623, in main scaler_st) File "TTS/TTS/bin/train_tacotron.py", line 184, in train text_lengths) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/content/TTS/TTS/tts/layers/losses.py", line 398, in forward raise RuntimeError(f" [!] NaN loss with {key}.") ! Run is kept in /home/yuten/work/hakka-ddc-March-27-2022_09+31PM-0000000
Here is my config file.
I have tried several times in Colab and the training always hits the exception around 2k steps. I am new to TTS. Please let me know if my config or datasets have problems. Thanks.
Here is my Colab link: https://colab.research.google.com/drive/1HSZKJkDwwuFv9oLq7Dvti_4QKTn-efpe#scrollTo=etkwAVQUQc2k dataset(zip)+code(py)+work-log https://drive.google.com/drive/folders/1xfLEnriP_NtheaZ9N2Befn8drw4O92z2?usp=sharing train+test+val(csv and text files) https://drive.google.com/drive/folders/15JCkCF5MwkU-KI613tNFoRDdg5DELikQ?usp=sharing
I'm trying to improve French Tacotron2 DDC, because there is some noises you don't have in English synthesizer made with Tacotron 2. There is also some pronunciation defaults on nasal fricatives, certainly because missing phonemes (ɑ̃, ɛ̃) like in œ̃n ɔ̃ɡl də ma tɑ̃t ɛt ɛ̃kaʁne (Un ongle de ma tante est incarné.)
I started to train text2feat, from scratch on French corpus (MAI_ezwa) but after 10k to 15k the loss increase drastically and the result is not so bad with your vocoder, but you told that you trained the model only for 100k steps, 10 times as I do. How can I arrive to 100k steps and more.
I join the config file.
Thanks
There is a big bug in your
expand_abbreviations
in French. I send you the new one. I send also a newsymbols.py
, with the missing nasal vowels. abbreviations_symbols.zipOriginally posted by @lpierron in https://github.com/mozilla/TTS/issues/539#issuecomment-783408544