wqt2019 / tacotron-2_melgan

tacotron-2(pytorch) + melgan(pytorch) chinese TTS
MIT License
26 stars 6 forks source link

taco2和melgan参数是否需要对齐? #10

Open yannier912 opened 3 years ago

yannier912 commented 3 years ago

您好,在训练taco2和melgan之前,有些参数跟您确认下。 taco2的hparams中: silence_threshold=2, #silence threshold used for sound trimming for wavenet preprocessing

#Mel spectrogram
n_fft = 2048, #Extra window size is filled with 0 paddings to match this parameter
hop_size = 275, #For 22050Hz, 275 ~= 12.5 ms (0.0125 * sample_rate)
win_size = 1100, #For 22050Hz, 1100 ~= 50 ms (If None, win_size = n_fft) (0.05 * sample_rate)
sample_rate = 22050, #22050 Hz (corresponding to ljspeech dataset) (sox --i <filename>)
frame_shift_ms = None, #Can replace hop_size parameter. (Recommended: 12.5)
magnitude_power = 2., #The power of the spectrogram magnitude (1. for energy, 2. for power)

#Mel and Linear spectrograms normalization/scaling and clipping
signal_normalization = True, #Whether to normalize mel spectrograms to some predefined range (following below parameters)
allow_clipping_in_normalization = True, #Only relevant if mel_normalization = True
symmetric_mels = True, #Whether to scale the data to be symmetric around 0. (Also multiplies the output range by 2, faster and cleaner convergence)
max_abs_value = 4., #max absolute value of data. If symmetric, data will be [-max, max] else [0, max] (Must not be too big to avoid gradient explosion, 

    #Limits
min_level_db = -100,
ref_level_db = 20,
fmin = 55, #Set this to 55 if your speaker is male! if female, 95 should help taking off noise. (To test depending on dataset. Pitch info: male~[65, 260], female~[100, 525])
fmax = 7600, #To be increased/reduced depending on data.

melgan的config中:

audio:
  n_mel_channels: 80
  segment_length: 16000
  pad_short: 2000
  filter_length: 1024
  hop_length: 256 # WARNING: this can't be changed.
  win_length: 1024
  sampling_rate: 22050
  mel_fmin: 0.0
  mel_fmax: 8000.0

两者的fft lenght一个2048一个1024,hop size一个275一个156,win size一个1100一个1024,fmin和fmax也不对应。 请问您训练时候也没有把这些参数调整一致吗?这些会影响结果吗?感谢!!!

wqt2019 commented 3 years ago

直接用tacotron2预处理出来的mel和audio,给到melgan训练就好了。参数确实要匹配,因为没有用到melgan的预处理代码,所以这部分就忘了改

yannier912 commented 3 years ago

@wqt2019 感谢回复!您训练melgan是非gta模式是吗?把taco2预处理的wav和mel分别给到taco2和melgan进行训练。那我也按这种方式训练试一下

wqt2019 commented 3 years ago

一般分开训练比较好吧,预处理代码用同样的就行。 推荐用torch版的tacotron2,模型收敛速度和推理速度都比tf版的快

yannier912 commented 3 years ago

@wqt2019 没太懂分开训练是指什么呢?taco2预处理生成的wav和mel作为trainingdata,taco2和melgan都用同样的trainingdata作为训练输入,这样是您说的分开训练吗 另外,预处理用同样代码,是指直接用taco2的process替换melgan的process吗?不好意思这块我有点理不清。。。

wqt2019 commented 3 years ago

是这么理解

yannier912 commented 3 years ago

您这代码里是已经把melgan的预处理替换成和taco2一致了吗?我直接按您代码训练可以么

wqt2019 commented 3 years ago

taco2的预处理出来的结果直接给到melgan训练就行了,修改下melgan读取训练数据的路径就好了

yannier912 commented 3 years ago

好的,我训练看下效果,感谢感谢!!!!

yannier912 commented 3 years ago

您好,再请教一下,刚才提到的预处理用同一套代码,可以用taco2的process替换melgan的process,是指语音合成时候吧?训练时候也需要吗?

yannier912 commented 3 years ago

@wqt2019 taco2在训练中,melgan训练刚开始不久就报错了,您遇到过这个问题吗?

Validation loop: 100%|██████████| 12053/12053 [05:41<00:00, 35.31it/s] g 66.6089 d 19.8717 | step 286: 5%|▍ | 286/6026 [00:51<17:12, 5.56it/s] 2021-07-13 15:14:51,603 - INFO - Exiting due to exception: Caught ValueError in DataLoader worker process 0. Original Traceback (most recent call last): File "/home/work/.local/lib/python3.6/site-packages/torch/utils/data/_utils/worker.py", line 185, in _worker_loop data = fetcher.fetch(index) File "/home/work/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/home/work/.local/lib/python3.6/site-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/data/app/qtt/tts/tacotron-2_melgan/melgan_vocoder/datasets/dataloader.py", line 40, in getitem return self.my_getitem(idx1), self.my_getitem(idx2) File "/data/app/qtt/tts/tacotron-2_melgan/melgan_vocoder/datasets/dataloader.py", line 76, in my_getitem mel_start = random.randint(0, max_mel_start) File "/usr/lib64/python3.6/random.py", line 221, in randint return self.randrange(a, b+1) File "/usr/lib64/python3.6/random.py", line 199, in randrange raise ValueError("empty range for randrange() (%d,%d, %d)" % (istart, istop, width)) ValueError: empty range for randrange() (0,-9, -9)

wqt2019 commented 3 years ago

你可以debug看下,应该是取随机数的时候有问题,实在不行加个try吧。代码很久没动了。

yannier912 commented 3 years ago

你可以debug看下,应该是取随机数的时候有问题,实在不行加个try吧。代码很久没动了。

嗯嗯好的,感谢

yannier912 commented 3 years ago

@wqt2019 您好,打扰再问个问题,您用标贝训练,标贝是10小时左右吧,请问您训练melgan用时多久呢?几小时,还是几天呢?谢谢!!! 我训练了一个晚上,发现现在效果很差,不知道时长不够还是我哪里不对

wqt2019 commented 3 years ago

很久没搞melgan了,你看下原版吧,正常情况下,标贝训练得结果还是很好的