Closed Pranjalya closed 8 months ago
What does 'unintelligible speech' mean? Can I see your training logs or Tensorboard?
We have used about 60 hours of Hindi data from LIMMITS and have experience using phonemizer. Have you checked if tokens were properly extracted during the training and inference stages?
Hi
I have attached tensorboard loss curves for our TTV v1 model which was trained with LibriTTS-960 dataset. we used 4x GPUs with 128 batch size (32 per GPU).
.
How about the ctc loss curve you trained? Our checkpoint is from 930k steps.
and
I actually do not know Hindi language well... but I think Phonemizer may not be good for Hindi Language. In this case, how about using other tokenizer?
We have used phonemizer as well, and from past experience, it works decently for Hindi as well. Here are my logs:
"unintelligible" means like it sounded like it was speaking clearly but nothing related to the text and not in the language. But again, it was just with 20k steps checkpoint.
@hayeong0 from how many steps onward we start getting some audible voice when train TTV from scratch ?
Just for reference, the audio from 20k steps.
https://github.com/sh-lee-prml/HierSpeechpp/assets/36627085/25cc5bc9-b262-4150-a673-39f6bc6ebca8
Here is our results from 10k, 20k, 50k, 100k, 200k, 950k. (with hierspeech synthesizer v1)
I have attached audio for some text and speaker of libritts-test-clean.
When using LibriTTS dataset, the 10k steps model can synthesize an audible speech.
Thanks!
Thank you very much, it helped.
Thanks for the repo. I had around 30 hours of custom Hindi data which I wanted to train and test the model on. Training only the TTV part on 4x A6000 GPUs with 64 batch size, I tried inferencing with the provided VC checkpoint, but I was getting unintelligible results. There was no distortion audio, but just unintelligible speech. What do you reckon could be the cause of it? Was it because I will need a VC model for Hindi as well, or my training steps were low for getting the results? And after how many steps, do you think we can expect a checkpoint on which we can get a decent preliminary result? Thanks again!