Closed FarisHijazi closed 3 years ago
Hi FarisHijazi,
Thanks for your interests of this work! If I understand your question correctly, this work does not really need the alignments during training or inference. Please let me know if you were referring to anything else.
Hi FarisHijazi,
Thanks for your interests of this work! If I understand your question correctly, this work does not really need the alignments during training or inference. Please let me know if you were referring to anything else.
I found the VCTK dataset dose not match the procedure of the _synthesizer_preprocessaudio.py .the process of synthesizer_preprocess_audio.py looks like to deal with the libritts dataset
Oh, I see. I pushed the wrong version to the repo, has fixed at https://github.com/shaojinding/Adversarial-Many-to-Many-VC/commit/37da0fc7b9ce585cd578bcde0cc61567a104a67d
Let me know if it works. Thanks
Oh, I see. I pushed the wrong version to the repo, has fixed at https://github.com/shaojinding/Adversarial-Many-to-Many-VC/commit/37da0fc7b9ce585cd578bcde0cc61567a104a67d
Let me know if it works. Thanks
thanks for reply, it's still have a problem, synthesizer/preprocess.py in line 159, the function of _process_utterance(wav, out_dir, wav_cat_fname, skipexisting, hparams) , miss a paramters of text:str
I see that you don't need the alignment times, but the vctk preprocess code does look for alignments. I'll fix that with a try/except and I'll submit a pr (fyi there were many bugs for the librispeech preprocessing). It seems you preferred vctk. I fixed most of them, expect a pr from me soon. I'll close the issue when i verify that it works
I see that you don't need the alignment times, but the vctk preprocess code does look for alignments. I'll fix that with a try/except and I'll submit a pr (fyi there were many bugs for the librispeech preprocessing). It seems you preferred vctk. I fixed most of them, expect a pr from me soon. I'll close the issue when i verify that it works
Actually,I have already set non for vctk text alignments, the data preprocess is done, but when I run the _synthesizertrain.py the ValueError appeared:
Traceback (most recent call last):
File "synthesizer_train.py", line 56, in <module>
tacotron_train(args, log_dir, hparams)
File "adversarial-many-to-many-vc/synthesizer/train.py", line 408, in tacotron_train
return train(log_dir, args, hparams)
File "adversarial-many-to-many-vc/synthesizer/train.py", line 159, in train
model, stats = model_train_mode(args, feeder, hparams, global_step)
File "adversarial-many-to-many-vc/synthesizer/train.py", line 98, in model_train_mode
model.add_optimizer(global_step)
File "adversarial-many-to-many-vc/synthesizer/models/tacotron.py", line 529, in add_optimizer
expanded_g = tf.expand_dims(g, 0)
File "/anaconda3/envs/ppg-vc/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "/anaconda3/envs/ppg-vc/lib/python3.6/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/anaconda3/envs/ppg-vc/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 148, in expand_dims
return expand_dims_v2(input, axis, name)
File "/anaconda3/envs/ppg-vc/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "/anaconda3/envs/ppg-vc/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 197, in expand_dims_v2
return gen_array_ops.expand_dims(input, axis, name)
File "/anaconda3/envs/ppg-vc/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 2459, in expand_dims
"ExpandDims", input=input, dim=axis, name=name)
File "/anaconda3/envs/ppg-vc/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 545, in _apply_op_helper
(input_name, err))
ValueError: Tried to convert 'input' to a tensor and failed. Error: None values not supported.
and here is my config:
-----------------------------------------------------------------
Starting new vc_adversarial training run
-----------------------------------------------------------------
[2020-12-04 09:48:35.874] Checkpoint path: synthesizer/saved_models/logs-vc_adversarial/taco_pretrained/tacotron_model.ckpt
[2020-12-04 09:48:35.874] Loading training data from: data/SV2TTS/synthesizer_train/train.txt
[2020-12-04 09:48:35.874] Using model: Tacotron
[2020-12-04 09:48:35.875] Hyperparameters:
allow_clipping_in_normalization: True
attention_dim: 128
attention_filters: 32
attention_kernel: (31,)
cbhg_conv_channels: 128
cbhg_highway_units: 128
cbhg_highwaynet_layers: 4
cbhg_kernels: 8
cbhg_pool_size: 2
cbhg_projection: 256
cbhg_projection_kernel_size: 3
cbhg_rnn_units: 128
cleaners: english_cleaners
clip_for_wavenet: True
clip_mels_length: True
cross_entropy_pos_weight: 20
cumulative_weights: True
decoder_layers: 2
decoder_lstm_units: 1024
embedding_dim: 512
enc_conv_channels: 512
enc_conv_kernel_size: (5,)
enc_conv_num_layers: 3
enc_prenet_layers: [128, 256]
encoder_lstm_units: 256
fmax: 7600
fmin: 55
frame_shift_ms: None
griffin_lim_iters: 60
hop_size: 200
if_use_speaker_classifier: False
is_encoder_lstm_2layers: False
is_encoder_lstm_pyramid: True
mask_decoder: False
mask_encoder: True
max_abs_value: 4.0
max_iters: 2000
max_mel_frames: 900
min_level_db: -100
n_fft: 800
n_speakers: 105
natural_eval: False
normalize_for_wavenet: True
num_mels: 80
num_ppgs: 40
outputs_per_step: 1
postnet_channels: 512
postnet_kernel_size: (5,)
postnet_num_layers: 5
power: 1.5
predict_linear: False
preemphasis: 0.97
preemphasize: True
prenet_layers: [256, 256]
ref_level_db: 20
rescale: False
rescaling_max: 0.9
sample_rate: 16000
signal_normalization: True
silence_min_duration_split: 0.4
silence_threshold: 2
smoothing: False
speaker_embedding_size: 256
split_on_cpu: True
stop_at_any: True
symmetric_mels: True
tacotron_adam_beta1: 0.9
tacotron_adam_beta2: 0.999
tacotron_adam_epsilon: 1e-06
tacotron_batch_size: 36
tacotron_clip_gradients: True
tacotron_data_random_state: 1234
tacotron_decay_learning_rate: True
tacotron_decay_rate: 0.5
tacotron_decay_steps: 50000
tacotron_dropout_rate: 0.5
tacotron_final_learning_rate: 1e-05
tacotron_gpu_start_idx: 3
tacotron_initial_learning_rate: 0.001
tacotron_num_gpus: 1
tacotron_random_seed: 5339
tacotron_reg_weight: 1e-07
tacotron_scale_regularization: False
tacotron_start_decay: 50000
tacotron_swap_with_cpu: False
tacotron_synthesis_batch_size: 128
tacotron_teacher_forcing_decay_alpha: 0.0
tacotron_teacher_forcing_decay_steps: 280000
tacotron_teacher_forcing_final_ratio: 0.0
tacotron_teacher_forcing_init_ratio: 1.0
tacotron_teacher_forcing_mode: constant
tacotron_teacher_forcing_ratio: 1.0
tacotron_teacher_forcing_start_decay: 10000
tacotron_test_batches: None
tacotron_test_size: 0.05
tacotron_zoneout_rate: 0.1
train_with_GTA: False
trim_fft_size: 512
trim_hop_size: 128
trim_top_db: 23
use_full_ppg: False
use_lws: False
utterance_min_duration: 1.6
win_size: 800
[2020-12-04 09:48:36.039] Loaded metadata for 32881 examples (25.81 hours)
[2020-12-04 09:48:46.382] initialisation done /gpu:3
[2020-12-04 09:48:46.382] Initialized Tacotron model. Dimensions (? = dynamic shape):
[2020-12-04 09:48:46.382] Train mode: True
[2020-12-04 09:48:46.382] Eval mode: False
[2020-12-04 09:48:46.382] GTA mode: False
[2020-12-04 09:48:46.382] Synthesis mode: False
[2020-12-04 09:48:46.382] Input: (?, ?, 40)
[2020-12-04 09:48:46.382] device: 3
[2020-12-04 09:48:46.382] embedding: (?, ?, 40)
[2020-12-04 09:48:46.382] enc conv out: (?, ?, 512)
[2020-12-04 09:48:46.382] adversial classifier out: ?
[2020-12-04 09:48:46.382] encoder out (cond): (?, ?, 768)
[2020-12-04 09:48:46.382] decoder out: (?, ?, 80)
[2020-12-04 09:48:46.382] residual out: (?, ?, 512)
[2020-12-04 09:48:46.382] projected residual out: (?, ?, 80)
[2020-12-04 09:48:46.382] mel out: (?, ?, 80)
[2020-12-04 09:48:46.382] <stop_token> out: (?, ?)
[2020-12-04 09:48:46.384] Tacotron Parameters 29.271 Million.
so i guess somewhere must wrong in code,maybe is the data propress, i`m still debug it, thanks for your help
Hello, I can't find the VCTK dataset alignments anywhere, and I did find this method from deepvoice3. but I'm not even sure if it's compatible.
Could you please upload the VCTK alignment files? or have a way to generate them?