google / seq2seq

A general-purpose encoder-decoder framework for Tensorflow
https://google.github.io/seq2seq/
Apache License 2.0
5.6k stars 1.3k forks source link

Isuue of CUDA_ERROR_OUT_OF_MEMORY while running the toy example #270

Open Sandhya2207 opened 7 years ago

Sandhya2207 commented 7 years ago

Hi, while testing the seq2seq model on toy data set, I am getting the following error--

E tensorflow/core/common_runtime/direct_session.cc:137] Internal: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY; total memory reported: 18446744073709551615 Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "main", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/home/development/sandhya/installs/tf-seq2seq-google/seq2seq-master/bin/train.py", line 277, in tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 44, in run _sys.exit(main(_sys.argv[:1] + flags_passthrough)) File "/home/development/sandhya/installs/tf-seq2seq-google/seq2seq-master/bin/train.py", line 272, in main schedule=FLAGS.schedule) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/learn_runner.py", line 106, in run return task() File "seq2seq/contrib/experiment.py", line 104, in continuous_train_and_eval monitors=self._train_monitors) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 280, in new_func return func(*args, **kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 426, in fit loss = self._train_model(input_fn=input_fn, hooks=hooks) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/learn/python/learn/estimators/estimator.py", line 981, in _train_model config=self.config.tf_config) as mon_sess: File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 315, in MonitoredTrainingSession return MonitoredSession(session_creator=session_creator, hooks=all_hooks) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 601, in init session_creator, hooks, should_recover=True) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 434, in init self._sess = _RecoverableSession(self._coordinated_creator) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 767, in init _WrappedSession.init(self, self._create_session()) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 772, in _create_session return self._sess_creator.create_session() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 494, in create_session self.tf_sess = self._session_creator.create_session() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/monitored_session.py", line 375, in create_session init_fn=self._scaffold.init_fn) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/session_manager.py", line 256, in prepare_session config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/session_manager.py", line 161, in _restore_checkpoint sess = session.Session(self._target, graph=self._graph, config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1176, in init super(Session, self).init(target, graph, config=config) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 552, in init self._session = tf_session.TF_NewDeprecatedSession(opts, status) File "/usr/lib/python2.7/contextlib.py", line 24, in exit self.gen.next() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status pywrap_tensorflow.TF_GetCode(status)) tensorflow.python.framework.errors_impl.InternalError: Failed to create session.

Have tried the earlier suggested fix of resetting the flag " gpu_allow_growth" to True. Kindly suggest.

Thanks

yanghoonkim commented 7 years ago

Which gpu are you using? You just run the same code provided by tutorial?

Sandhya2207 commented 7 years ago

The gpu is GTX-Titan .

chiphuyen commented 7 years ago

I have similar problem when I have multiple processes running and they hog up memory. Killing other processes fixes the problem.

joshisagar92 commented 5 years ago

Thanks @chiphuyen for intuition. I encountered same error with tf estimator and fixed it by closing other jupyter notebook instances running on same device.