jerryuhoo / VTuberTalk

Apache License 2.0
375 stars 53 forks source link

语音合成失败 #5

Open ainokiseki opened 2 years ago

ainokiseki commented 2 years ago

训练完成后使用gui/main2.py合成声音失败,我的训练数据仅包含一位名为speaker_id的vup,代码为0,第四步中仅完成了fastspeech + hifigan + single及speedyspeech + pwg的静态模型。gui加载正常,但当我加载参考音频、输入需要合成的语句后,程序崩溃退出了。提示信息如下:


gst-fastspeech2
multiple speaker
spk_num: 1
vocab_size: 193
fastspeech2
encoder_type is transformer
use gst
decoder_type is transformer
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\framework\io.py:415: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.10 it will stop working
  if isinstance(obj, collections.Iterable) and not isinstance(obj, (
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.0.weight. gst.ref_enc.convs.0.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.1.weight. gst.ref_enc.convs.1.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.1.bias. gst.ref_enc.convs.1.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.1._mean. gst.ref_enc.convs.1._mean is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.1._variance. gst.ref_enc.convs.1._variance is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.3.weight. gst.ref_enc.convs.3.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.4.weight. gst.ref_enc.convs.4.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.4.bias. gst.ref_enc.convs.4.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.4._mean. gst.ref_enc.convs.4._mean is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.4._variance. gst.ref_enc.convs.4._variance is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.6.weight. gst.ref_enc.convs.6.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.7.weight. gst.ref_enc.convs.7.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.7.bias. gst.ref_enc.convs.7.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.7._mean. gst.ref_enc.convs.7._mean is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.7._variance. gst.ref_enc.convs.7._variance is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.9.weight. gst.ref_enc.convs.9.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.10.weight. gst.ref_enc.convs.10.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.10.bias. gst.ref_enc.convs.10.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.10._mean. gst.ref_enc.convs.10._mean is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.10._variance. gst.ref_enc.convs.10._variance is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.12.weight. gst.ref_enc.convs.12.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.13.weight. gst.ref_enc.convs.13.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.13.bias. gst.ref_enc.convs.13.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.13._mean. gst.ref_enc.convs.13._mean is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.13._variance. gst.ref_enc.convs.13._variance is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.15.weight. gst.ref_enc.convs.15.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.16.weight. gst.ref_enc.convs.16.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.16.bias. gst.ref_enc.convs.16.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.16._mean. gst.ref_enc.convs.16._mean is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.convs.16._variance. gst.ref_enc.convs.16._variance is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.gru.weight_ih_l0. gst.ref_enc.gru.weight_ih_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.gru.weight_hh_l0. gst.ref_enc.gru.weight_hh_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.gru.bias_ih_l0. gst.ref_enc.gru.bias_ih_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.gru.bias_hh_l0. gst.ref_enc.gru.bias_hh_l0 is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.gru.0.cell.weight_ih. gst.ref_enc.gru.0.cell.weight_ih is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.gru.0.cell.weight_hh. gst.ref_enc.gru.0.cell.weight_hh is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.gru.0.cell.bias_ih. gst.ref_enc.gru.0.cell.bias_ih is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.ref_enc.gru.0.cell.bias_hh. gst.ref_enc.gru.0.cell.bias_hh is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.stl.gst_embs. gst.stl.gst_embs is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.stl.mha.linear_q.weight. gst.stl.mha.linear_q.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.stl.mha.linear_q.bias. gst.stl.mha.linear_q.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.stl.mha.linear_k.weight. gst.stl.mha.linear_k.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.stl.mha.linear_k.bias. gst.stl.mha.linear_k.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.stl.mha.linear_v.weight. gst.stl.mha.linear_v.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.stl.mha.linear_v.bias. gst.stl.mha.linear_v.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.stl.mha.linear_out.weight. gst.stl.mha.linear_out.weight is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py:1436: UserWarning: Skip loading for gst.stl.mha.linear_out.bias. gst.stl.mha.linear_out.bias is not found in the provided dict.
  warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
fastspeech2 model done!
frontend done!
vocoder model done!
gst, True
vae, False
vocoder model done!
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\librosa\core\audio.py:165: UserWarning: PySoundFile failed. Trying audioread instead.
  warnings.warn("PySoundFile failed. Trying audioread instead.")
Building prefix dict from the default dictionary ...
[2022-01-20 16:38:29] [DEBUG] [__init__.py:113] Building prefix dict from the default dictionary ...
Loading model from cache C:\Users\wknd1\AppData\Local\Temp\jieba.cache
[2022-01-20 16:38:29] [DEBUG] [__init__.py:132] Loading model from cache C:\Users\wknd1\AppData\Local\Temp\jieba.cache
Loading model cost 0.707 seconds.
[2022-01-20 16:38:29] [DEBUG] [__init__.py:164] Loading model cost 0.707 seconds.
Prefix dict has been built successfully.
[2022-01-20 16:38:29] [DEBUG] [__init__.py:166] Prefix dict has been built successfully.
self.spk_id 0
D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\math_op_patch.py:251: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.int64, but right dtype is paddle.int32, the right dtype will convert to paddle.int64
  warnings.warn(
Traceback (most recent call last):
  File "gui/main2.py", line 476, in onGenerateButtonClicked
    mel = fastspeech2_inference(
  File "D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py", line 914, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "D:\code\asvocal\train/models\fastspeech2\fastspeech2.py", line 1031, in forward
    normalized_mel, d_outs, p_outs, e_outs, mu, logvar, z = self.acoustic_model.inference(
  File "D:\code\asvocal\train/models\fastspeech2\fastspeech2.py", line 837, in inference
    _, outs, d_outs, p_outs, e_outs, mu, logvar, z = self._forward(
  File "D:\code\asvocal\train/models\fastspeech2\fastspeech2.py", line 659, in _forward
    style_embs = self.gst(ys)
  File "D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py", line 914, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "D:\code\asvocal\train/modules\style_encoder.py", line 147, in forward
    ref_embs = self.ref_enc(speech)
  File "D:\tools\anaconda\envs\paddlespeech\lib\site-packages\paddle\fluid\dygraph\layers.py", line 914, in __call__
    outputs = self.forward(*inputs, **kwargs)
  File "D:\code\asvocal\train/modules\style_encoder.py", line 246, in forward
    batch_size = speech.shape[0]
AttributeError: 'NoneType' object has no attribute 'shape'```

我不清楚我是否正确完成了第四步,因此可否请您说明一下synthesize_e2e.sh中的参数应如何设置?谢谢~
jerryuhoo commented 2 years ago

你报错信息显示你用的模型是gst fastspeech2,这个模型需要输入参考音频,但是报错显示参考音频为NoneType,加载音频是否正确加载,格式需要是wav格式?按照报错信息来看应该是在加载音频这一步出错了,可能是没有正确读取fastspeech2_config这个文件(yaml文件),你需要检查一下模型路径里是否有这个文件。另外你训练的speedyspeech模型是否可用?另外main2.py应该不支持单人的合成,原因是单人预测的时候不需要输入spk_id。

jerryuhoo commented 2 years ago

单人的话这两行去掉spk_id,或者把spk_id改为None试试? https://github.com/jerryuhoo/VTuberTalk/blob/4c954b033ba4811855d01074daa6a362c479f96d/gui/main2.py#L489 https://github.com/jerryuhoo/VTuberTalk/blob/4c954b033ba4811855d01074daa6a362c479f96d/gui/main2.py#L496

ainokiseki commented 2 years ago

应该是我加载的参考音频不正确导致的,如您所说选择正确的参考音频后成功听到了合成后的声音。 至于单人多人的问题,我虽然只有一个人的数据,但训练的时候一直使用的是多人的代码,因为用单人的话命令参数没有给--speaker-dict,会报错,索性就直接用多人的了。最后合成的时候我把main2.py中阿梓对应的spk_id改成了我的speaker_dict中对应的值(0),就可以正常合成了。效果不太好,我推测是因为我数据太少,训练时间太短。 最后和您确认一个事情。main2.py的代码中会加载诸如exp/<模型名称>/speech_stats.npy等若干npy后缀的文件,但我在以上位置找不到这些文件,只在dump/train文件夹下找到了类似的文件,训练结束后我是否应手动将dump/train下的npy文件手动拷贝至对应的exp/<模型名称>/下呢?我之前是这么做的。

jerryuhoo commented 2 years ago

是这样的,暂时需要手动拷贝,我到时候会在脚本文件里加入这些。目前如果要获得好的效果,单人语音数量最好在10000条左右,但是也可以用多人模型,在openssr上下载开源的aishell3数据集,然后把单人的数据集放进去一起训练,可以减少数据集的大小,500条左右就可以了。但是经过我的实验,目前发现gst模型的效果并不好,也就是说加入参考音频反而效果不如普通的fastspeech2。speedyspeech目前的语气模仿的最像,但是音质很差,我也在找解决办法。并且vocoder占了很大的作用,你可以自己训练一个vocoder或者使用pwg,我提供的hifigan是百度预训练的模型,但是这个hifigan是只有一个人训练出来的,所以效果不如pwg。