syang1993 / gst-tacotron

A tensorflow implementation of the "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"
368 stars 110 forks source link

Throws "data must be floating-point" exception after 1k steps #25

Closed ishandutta2007 closed 5 years ago

ishandutta2007 commented 5 years ago

Running on LJ dataset. Basically this is the line where it's breaking https://github.com/syang1993/gst-tacotron/blob/b455ed21bf0c08e557dde0aaafaf40a1b4df5265/train.py#L115

Starting new training run at commit: None Generated 32 batches of size 32 in 39.301 sec Step 1 [43.557 sec/step, loss=0.84572, avg_loss=0.84572] Step 2 [23.415 sec/step, loss=0.85437, avg_loss=0.85004] ........ ........ Step 998 [2.387 sec/step, loss=0.14099, avg_loss=0.14424] Step 999 [2.387 sec/step, loss=0.14100, avg_loss=0.14422] Step 1000 [2.380 sec/step, loss=0.14311, avg_loss=0.14418] Writing summary at step: 1000 Saving checkpoint to: /media/iedc-beast/Disk 1/test/gst-tacotron-master/logs-tacotron/model.ckpt-1000 Saving audio and alignment... Exiting due to exception: data must be floating-point Traceback (most recent call last): File "train.py", line 115, in train audio.save_wav(waveform, os.path.join(log_dir, 'step-%d-audio.wav' % step)) File "/media/iedc-beast/Disk 1/test/gst-tacotron-master/util/audio.py", line 16, in save_wav librosa.output.write_wav(path, wav.astype(np.int16), hparams.sample_rate) File "/usr/local/lib/python3.5/dist-packages/librosa/output.py", line 223, in write_wav util.valid_audio(y, mono=False) File "/usr/local/lib/python3.5/dist-packages/librosa/util/utils.py", line 159, in valid_audio raise ParameterError('data must be floating-point') librosa.util.exceptions.ParameterError: data must be floating-point 2018-11-24 16:41:57.082342: W tensorflow/core/kernels/queue_base.cc:277] _0_datafeeder/input_queue: Skipping cancelled enqueue attempt with queue not closed Traceback (most recent call last): File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1292, in _do_call return fn(*args) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1277, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1367, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.CancelledError: Enqueue operation was cancelled [[{{node datafeeder/input_queue_enqueue}} = QueueEnqueueV2[Tcomponents=[DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](datafeeder/input_queue, _arg_datafeeder/inputs_0_1, _arg_datafeeder/input_lengths_0_0, _arg_datafeeder/mel_targets_0_3, _arg_datafeeder/linear_targets_0_2)]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/media/iedc-beast/Disk 1/test/gst-tacotron-master/datasets/datafeeder.py", line 75, in run self._enqueue_next_group() File "/media/iedc-beast/Disk 1/test/gst-tacotron-master/datasets/datafeeder.py", line 97, in _enqueue_next_group self._session.run(self._enqueue_op, feed_dict=feed_dict) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 887, in run run_metadata_ptr) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1110, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1286, in _do_run run_metadata) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1308, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.CancelledError: Enqueue operation was cancelled [[{{node datafeeder/input_queue_enqueue}} = QueueEnqueueV2[Tcomponents=[DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](datafeeder/input_queue, _arg_datafeeder/inputs_0_1, _arg_datafeeder/input_lengths_0_0, _arg_datafeeder/mel_targets_0_3, _arg_datafeeder/linear_targets_0_2)]]

Caused by op 'datafeeder/input_queue_enqueue', defined at: File "train.py", line 153, in main() File "train.py", line 149, in main train(log_dir, args) File "train.py", line 58, in train feeder = DataFeeder(coord, input_path, hparams) File "/media/iedc-beast/Disk 1/test/gst-tacotron-master/datasets/datafeeder.py", line 46, in init self._enqueue_op = queue.enqueue(self._placeholders) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 339, in enqueue self._queue_ref, vals, name=scope) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 3978, in queue_enqueue_v2 timeout_ms=timeout_ms, name=name) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func return func(*args, **kwargs) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3259, in create_op op_def=op_def) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1747, in init self._traceback = tf_stack.extract_stack()

CancelledError (see above for traceback): Enqueue operation was cancelled [[{{node datafeeder/input_queue_enqueue}} = QueueEnqueueV2[Tcomponents=[DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](datafeeder/input_queue, _arg_datafeeder/inputs_0_1, _arg_datafeeder/input_lengths_0_0, _arg_datafeeder/mel_targets_0_3, _arg_datafeeder/linear_targets_0_2)]]

syang1993 commented 5 years ago

@ishandutta2007 Hi, I guess it is caused by the librosa version. You can modify how to write wave with your environment.

ishandutta2007 commented 5 years ago

Thanks a lot @syang1993 for answering, I have been trying to reach out to you on multiple platforms for help on this thread for models that people have already built. Not sure people look at older threads. It would be great if you could share atleast the 200k steps(that you shared outputs of) model for us to continue more iterations on top of that.

syang1993 commented 5 years ago

Hi, I'm so sorry that I'm now doing an internship in a company, I cannot get the pre-trained model (I trained it several months ago when I was doing visiting research in Singapore). You can train it by yourself, it may take about 3 days to get 200K steps.

ishandutta2007 commented 5 years ago

Well on our gtx 1080 as per my estimate it's taking longer(maybe twice of that). And it is also not always about time, now a days people in the ML world are burning huge amount of compute hours and money unnecessary when sharing can solve it a lot. Can you share your email/linkedin/twitter etc , you seem to be really deep into Speech Synthesis, keeping in touch may be useful for both of us.

syang1993 commented 5 years ago

I trained it on P40, which may be faster. Yes you are right, sharing can solve a lot. Maybe this is the purpose of Github :)

I'm not so familiar with linkedin so that I don't know how to share my id, this is the link https://www.linkedin.com/in/yang-shan-182987119

ishandutta2007 commented 5 years ago

So in China do you use Ushi or Mamai ? Let's see if I can connect via them too. :)

ishandutta2007 commented 5 years ago

Thanks @syang1993 for the connect. I have triggered the run on our gtx 1080, it would take a month or so to get 500-600 iterations. We need to get it right close to google's performance or else it is unusable for real life scenarios. If you have access to more powerful GPUs, It would be a great favour if you could do a train for larger iterations and share the model with the community. Till now there is no properly trained tacotron with style transfer on internet, this will be the first one.

syang1993 commented 5 years ago

@ishandutta2007 Usually, google tends to use a lot of GPUs to train such a model. And they use about 200 hours data to get their performance. So I think it's hard to reconstruct their performance. By the way, one of my friend begins to train a model use this repo, I can share it when it's finished.

ishandutta2007 commented 5 years ago

No wonder why Elon Musk fears of Google colonising the world :D

@syang1993 what's the best communicator/instant messager to keep in touch with you, we shouldn't be discussing stuff not related to the thread, I will switch over to the models thread for further updates on this.

Do let me know what's best to reach you. Don't hesitate even if I need to install wechat or something. In India we use:

syang1993 commented 5 years ago

@ishandutta2007 We mostly use wechat in China, and my wechat id is ys_think . I also use linkedin (not usual) and gmail: syang.mix@gmail.com

eyaler commented 5 years ago

just hit the same error. @ishandutta2007 how did you get around this?

eyaler commented 5 years ago

i solved it by changing util / audio.py / save_wav() :

librosa.output.write_wav(path, wav.astype(np.int16), hparams.sample_rate)

to

librosa.output.write_wav(path, wav, hparams.sample_rate)