Open Quantum-Electrodynamics opened 2 years ago
训练完成后要把exp里的npy文件都复制到模型路径里
训练完成后要把exp里的npy文件都复制到模型路径里
请问是将哪里的文件复制到哪里呢? 我使用的是这一行代码进行合成,speedyspeech似乎仅需要feats_stats.npy这一个npy文件。
python train/exps/synthesize_e2e.py \
--am=speedyspeech_csmsc \
--am_config=train/conf/speedyspeech/default.yaml \
--am_ckpt=exp/fastspeech2_aishell3_english/checkpoints/snapshot_iter_2200.pdz \
--am_stat=dump/train/feats_stats.npy \
--voc=pwgan_csmsc \
--voc_config=pretrained_models/pwg_baker_ckpt_0.4/pwg_default.yaml \
--voc_ckpt=pretrained_models/pwg_baker_ckpt_0.4/pwg_snapshot_iter_400000.pdz \
--voc_stat=pretrained_models/pwg_baker_ckpt_0.4/pwg_stats.npy \
--lang=zh \
--text=sentences.txt \
--output_dir=train/test_e2e \
--inference_dir=train/inference \
--phones_dict=dump/phone_id_map.txt \
--tones_dict=dump/tone_id_map.txt
下面是它的输出,似乎并没有报错,但是合成的语音是杂音,是样本数量不够吗?我只输入了一个近2小时的视频,生成了800条样本。
/usr/local/lib/python3.7/dist-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
dtype=np.complex,
/usr/local/lib/python3.7/dist-packages/IPython/utils/module_paths.py:29: DeprecationWarning: the imp module is deprecated in favour of importlib; see the module's documentation for alternative uses
import imp
========Args========
am: speedyspeech_csmsc
am_ckpt: exp/speedyspeech_bili3_aishell3/checkpoints/snapshot_iter_2200.pdz
am_config: train/conf/speedyspeech/default.yaml
am_stat: dump_speedyspeech/train/feats_stats.npy
energy_stat: null
inference_dir: train/inference
lang: zh
ngpu: 0
output_dir: train/test_e2e
phones_dict: dump_speedyspeech/phone_id_map.txt
pitch_stat: null
speaker_dict: null
spk_id: 0
text: sentences.txt
tones_dict: dump_speedyspeech/tone_id_map.txt
use_gst: false
use_style: false
use_vae: false
voc: pwgan_csmsc
voc_ckpt: pretrained_models/pwg_baker_ckpt_0.4/pwg_snapshot_iter_400000.pdz
voc_config: pretrained_models/pwg_baker_ckpt_0.4/pwg_default.yaml
voc_stat: pretrained_models/pwg_baker_ckpt_0.4/pwg_stats.npy
========Config========
batch_size: 32
fmax: 7600
fmin: 80
fs: 24000
max_epoch: 100
model:
decoder_dilations: [1, 3, 9, 27, 1, 3, 9, 27, 1, 3, 9, 27, 1, 3, 9, 27, 1, 1]
decoder_hidden_size: 128
decoder_kernel_size: 3
decoder_output_size: 80
duration_predictor_hidden_size: 128
encoder_dilations: [1, 3, 9, 27, 1, 3, 9, 27, 1, 1]
encoder_hidden_size: 128
encoder_kernel_size: 3
n_fft: 2048
n_mels: 80
n_shift: 300
num_snapshots: 5
num_workers: 4
optimizer:
learning_rate: 0.002
max_grad_norm: 1
optim: adam
seed: 10086
win_length: 1200
window: hann
allow_cache: True
batch_max_steps: 25500
batch_size: 6
discriminator_grad_norm: 1
discriminator_optimizer_params:
epsilon: 1e-06
weight_decay: 0.0
discriminator_params:
bias: True
conv_channels: 64
in_channels: 1
kernel_size: 3
layers: 10
nonlinear_activation: LeakyReLU
nonlinear_activation_params:
negative_slope: 0.2
out_channels: 1
use_weight_norm: True
discriminator_scheduler_params:
gamma: 0.5
learning_rate: 5e-05
step_size: 200000
discriminator_train_start_steps: 100000
eval_interval_steps: 1000
fmax: 7600
fmin: 80
fs: 24000
generator_grad_norm: 10
generator_optimizer_params:
epsilon: 1e-06
weight_decay: 0.0
generator_params:
aux_channels: 80
aux_context_window: 2
bias: True
dropout: 0.0
freq_axis_kernel_size: 1
gate_channels: 128
in_channels: 1
interpolate_mode: nearest
kernel_size: 3
layers: 30
nonlinear_activation: None
nonlinear_activation_params:
out_channels: 1
residual_channels: 64
skip_channels: 64
stacks: 3
upsample_scales: [4, 5, 3, 5]
use_causal_conv: False
use_weight_norm: True
generator_scheduler_params:
gamma: 0.5
learning_rate: 0.0001
step_size: 200000
lambda_adv: 4.0
n_fft: 2048
n_mels: 80
n_shift: 300
num_save_intermediate_results: 4
num_snapshots: 10
num_workers: 4
pin_memory: True
remove_short_samples: True
save_interval_steps: 5000
seed: 42
stft_loss_params:
fft_sizes: [1024, 2048, 512]
hop_sizes: [120, 240, 50]
win_lengths: [600, 1200, 240]
window: hann
top_db: 60
train_max_steps: 400000
trim_frame_length: 2048
trim_hop_length: 512
trim_silence: False
win_length: 1200
window: hann
dump_speedyspeech/phone_id_map.txt
frontend done!
vocab_size: 69
tone_size: 6
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for position_enc.alpha. position_enc.alpha is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for encoder.linear.weight. encoder.linear.weight is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for encoder.linear.bias. encoder.linear.bias is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for duration_predictor.linear.weight. duration_predictor.linear.weight is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for duration_predictor.linear.bias. duration_predictor.linear.bias is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for decoder.postnet2.blocks.0.0.weight. decoder.postnet2.blocks.0.0.weight is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for decoder.postnet2.blocks.0.0.bias. decoder.postnet2.blocks.0.0.bias is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for decoder.postnet2.blocks.0.2.weight. decoder.postnet2.blocks.0.2.weight is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for decoder.postnet2.blocks.0.2.bias. decoder.postnet2.blocks.0.2.bias is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for decoder.postnet2.blocks.0.2._mean. decoder.postnet2.blocks.0.2._mean is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for decoder.postnet2.blocks.0.2._variance. decoder.postnet2.blocks.0.2._variance is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for decoder.postnet2.blocks.1.0.weight. decoder.postnet2.blocks.1.0.weight is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for decoder.postnet2.blocks.1.0.bias. decoder.postnet2.blocks.1.0.bias is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for decoder.postnet2.blocks.1.2.weight. decoder.postnet2.blocks.1.2.weight is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for decoder.postnet2.blocks.1.2.bias. decoder.postnet2.blocks.1.2.bias is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for decoder.postnet2.blocks.1.2._mean. decoder.postnet2.blocks.1.2._mean is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for decoder.postnet2.blocks.1.2._variance. decoder.postnet2.blocks.1.2._variance is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for decoder.linear.weight. decoder.linear.weight is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
/usr/local/lib/python3.7/dist-packages/paddle/fluid/dygraph/layers.py:1492: UserWarning: Skip loading for decoder.linear.bias. decoder.linear.bias is not found in the provided dict.
warnings.warn(("Skip loading for {}. ".format(key) + str(err)))
acoustic model done!
voc done!
Building prefix dict from the default dictionary ...
[2022-07-27 01:53:22] [DEBUG] [__init__.py:113] Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
[2022-07-27 01:53:22] [DEBUG] [__init__.py:133] Loading model from cache /tmp/jieba.cache
Loading model cost 1.196 seconds.
[2022-07-27 01:53:23] [DEBUG] [__init__.py:165] Loading model cost 1.196 seconds.
Prefix dict has been built successfully.
[2022-07-27 01:53:23] [DEBUG] [__init__.py:166] Prefix dict has been built successfully.
001, mel: [21, 80], wave: (6300, 1), time: 2.3417811670001356s, Hz: 2690.2599135983382, RTF: 8.921071112381469.
001 done!
002, mel: [22, 80], wave: (6600, 1), time: 0.7707957879997593s, Hz: 8562.57922364524, RTF: 2.802893774544579.
002 done!
generation speed: 4144.4758431684895Hz, RTF: 5.790840846511433
另外,fastspeech无法进入训练的原因我找到了,应该是我没删除训练speedyspeech后的临时文件造成的。
在
"train"
过程中抛出提示后就自动进入"copy files to model"
过程,意味着没有进行训练, 此外,使用speedyspeech
可以完成训练,但是在进行synthesize_e2e
时也有类似提示,虽然没有报错,但是合成的声音没有效果。