(-4,4) norm for mel spec looks better than (0,1) norm for quick alignment

begeekmyfriend commented 5 years ago

I have made some experiments on my own fork from yours. And I found when I expanded the range of mel spectrogram normalization in audio.py from (0,1) to (-4,4). The alignment came more quickly. step-22000-align step-6000-align The modifications are derived from Rayhane Mamah's T2 project. It not only expand the width of max value but also expand symmetrically. And the explanation for this issue is shown at https://github.com/Rayhane-mamah/Tacotron-2/issues/18#issuecomment-382637788. I think the results can be verified. What do you think about that? What is more, in my point of view, clipping in normalization is unnecessary because it might lose some edge information. When I cancel that clipping it still works well for both of alignment and evaluation.

sjtilney commented 5 years ago

How does the audio sound after 6000 steps?

begeekmyfriend commented 5 years ago

Here are all my evaluation just for Chinese mandarin https://github.com/NVIDIA/tacotron2/issues/74#issuecomment-434996929

begeekmyfriend commented 5 years ago

Some ads in Chinese mandarin. Do you think it sounds like wavenet though it still has some noises... ad_48000.zip By the way, I have not merged my mandarin branch modification into the master branch of my fork. You might not use my fork directly.

sjtilney commented 5 years ago

On the @begeekmyfriend fork, the training fails while saving the checkpoint. This happens on the master branch, which used to work fine before the merge. This is the output I get during training. Saving checkpoint to: ./logs-tacotron/model.ckpt-1010 Saving audio and alignment... /home/ubuntu/TextToSpeech/tacotron-bgmf/tacotron/env/lib/python3.6/site-packages/librosa/util/utils.py:1725: FutureWarning: Conversion of the second argument of issubdtype from 'float' to 'np.floating' is deprecated. In future, it will be treated as 'np.float64 == np.dtype(float).type'. if np.issubdtype(x.dtype, float) or np.issubdtype(x.dtype, complex): Exiting due to exception: firwin() got an unexpected keyword argument 'fs' Traceback (most recent call last): File "train.py", line 119, in train audio.save_wav(waveform, os.path.join(log_dir, 'step-%d-audio.wav' % step)) File "/home/ubuntu/TextToSpeech/tacotron-bgmf/tacotron/util/audio.py", line 24, in save_wav firwin = signal.firwin(hparams.num_freq, [hparams.fmin, hparams.fmax], pass_zero=False, fs=hparams.sample_rate) TypeError: firwin() got an unexpected keyword argument 'fs' 2019-01-08 21:26:14.630853: W tensorflow/core/kernels/queue_base.cc:277] _0_datafeeder/input_queue: Skipping cancelled enqueue attempt with queue not closed Traceback (most recent call last): File "/home/ubuntu/TextToSpeech/tacotron-bgmf/tacotron/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1292, in _do_call return fn(*args) File "/home/ubuntu/TextToSpeech/tacotron-bgmf/tacotron/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1277, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/ubuntu/TextToSpeech/tacotron-bgmf/tacotron/env/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.CancelledError: Enqueue operation was cancelled [[{{node datafeeder/input_queue_enqueue}} = QueueEnqueueV2[Tcomponents=[DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](datafeeder/input_queue, _arg_datafeeder/inputs_0_1, _arg_datafeeder/input_lengths_0_0, _arg_datafeeder/mel_targets_0_3, _arg_datafeeder/linear_targets_0_2, _arg_datafeeder/stop_token_targets_0_4)]]

begeekmyfriend commented 5 years ago

It is all right for me to run on master branch. Please check out if you have the right authority for writing on the disk. You might set the checkpoint_interval option as 1 in train.py to testify.

keithito / tacotron

(-4,4) norm for mel spec looks better than (0,1) norm for quick alignment #237