Closed fazlekarim closed 6 years ago
I think I am running out of memory. What type of GPU and how many GPU are you using? Is there a memory leakage somewhere? It makes no sense why I run out of memory after around 200 steps.
fixed it. code is fine. i just over reacted
@fazlekarim I am having a similar issue. How did you solve this?
@fazlekarim @lapwing It is an OOM error. Since there are some too long sentences, it may throw OOM error at some step. You can fix it by:
Do you have a script to remove sentences greater than 1200 frames?
@fazlekarim A simple way is to modify the data process script as attached. blizzard2013.zip
after running the code for about 200 steps, I am running into the following error. I can't figure out why. I feel like it should be an easy fix.
self._session.run(self._enqueue_op, feed_dict=feed_dict) File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 889, in run run_metadata_ptr) File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1120, in _run feed_dict_tensor, options, run_metadata) File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1317, in _do_run options, run_metadata) File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1336, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.CancelledError: Enqueue operation was cancelled [[Node: datafeeder/input_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](datafeeder/input_queue, _arg_datafeeder/inputs_0_1, _arg_datafeeder/input_lengths_0_0, _arg_datafeeder/mel_targets_0_3, _arg_datafeeder/linear_targets_0_2)]]
Caused by op 'datafeeder/input_queue_enqueue', defined at: File "train.py", line 153, in
main()
File "train.py", line 149, in main
train(log_dir, args)
File "train.py", line 58, in train
feeder = DataFeeder(coord, input_path, hparams)
File "/home/fakarim/projects/gst-tacotron/datasets/datafeeder.py", line 46, in init
self._enqueue_op = queue.enqueue(self._placeholders)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/ops/data_flow_ops.py", line 327, in enqueue
self._queue_ref, vals, name=scope)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2777, in _queue_enqueue_v2
timeout_ms=timeout_ms, name=name)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/home/fakarim/anaconda3/envs/gst-tacotron/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
CancelledError (see above for traceback): Enqueue operation was cancelled [[Node: datafeeder/input_queue_enqueue = QueueEnqueueV2[Tcomponents=[DT_INT32, DT_INT32, DT_FLOAT, DT_FLOAT], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](datafeeder/input_queue, _arg_datafeeder/inputs_0_1, _arg_datafeeder/input_lengths_0_0, _arg_datafeeder/mel_targets_0_3, _arg_datafeeder/linear_targets_0_2)]]