Open H4ppyB1rd opened 2 years ago
You may try 44.1 KHz, worked for me. (set in config.json: sampling_rate = 44100). Also make sure your audio is 1-channel 16-bits wave.
You may try 44.1 KHz, worked for me. (set in config.json: sampling_rate = 44100). Also make sure your audio is 1-channel 16-bits wave.
Works for me. Thx!
@nikich340 does your speech synthesis have a good result? My result is ok but the quality of speech is not so good, there is still noise in it and some mispronounciation? Do you get the same problem?
@nikich340 does your speech synthesis have a good result? My result is ok but the quality of speech is not so good, there is still noise in it and some mispronounciation? Do you get the same problem?
Rarely, I use good dataset (16 hours). If you have less than 2 hours of speech lines don't expect stable good results.
Also I edited processing scripts, so it accept straight IPA phonemes input (I used ng-espeak ipa preprocessing). In case you want model to generate some specific word. Make sure you made unified input (I used punctuation signs .,?! and ..), got rid of another-language-words, quotes. Preprocessing should do it, but check manually anyway.
@nikich340 does your speech synthesis have a good result? My result is ok but the quality of speech is not so good, there is still noise in it and some mispronounciation? Do you get the same problem?
Rarely, I use good dataset (16 hours). If you have less than 2 hours of speech lines don't expect stable good results.
Also I edited processing scripts, so it accept straight IPA phonemes input (I used ng-espeak ipa preprocessing). In case you want model to generate some specific word. Make sure you made unified input (I used punctuation signs .,?! and ..), got rid of another-language-words, quotes. Preprocessing should do it, but check manually anyway.
22050hz model produces low-quality speech (frequency range under 11k) which can be checked using Adobe Audition or mel spectrogram.
I wonder if the 44100hz model can produce a wider frequency range like 22k? Thanks in advance.
Hello @nikich340
I m trying training an 8000Hz with 2 hours of data and changed it in the config file before training but my audio seems like it's mumbling, not speaking properly.
Here is the sample of original recording
Also the generated audio sound like this
Can you suggest what is wrong with it?
My terminal shows weird output when I started single-speaker training on audios of sampling rate = 48000hz, after I finished the last round of single-speaker training with fine results on the same audios resampled to default sampling rate 22050hz.
After I run train.py, the terminal throws this message:
(I guess this wasn't the crucial problem?)
Then this message:
... for dozens of rows.
Then this:
lr_scheduler.step()
beforeoptimizer.step()
. In PyTorch 1.1.0 and later, you should call them in the opposite order:optimizer.step()
beforelr_scheduler.step()
. Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rateThen the training shows weird losses like:
each of the first four elements is nan. None of above happend with my previous 22050hz audio file training, so I'm wondering why and what I can do.(I've already modified json file in /configs to 48k sampling rate.) My apologies in advance if my questions were too basic.