drive.py crashes when running both keras model and simulator on local GPU

Mithrillion commented 6 years ago

It appears that the default behaviour of Tensorflow (as of version 1.5) is to allocate all remaining VRAM to a running session. However, this causes VRAM conflicts with the simulator and leads to a memory allocation error when running the computation graph. The following error message is seen if this error occurs:

E tensorflow/stream_executor/cuda/cuda_dnn.cc:385] could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
E tensorflow/stream_executor/cuda/cuda_dnn.cc:352] could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
F tensorflow/core/kernels/conv_ops.cc:605] Check failed: stream->parent()->GetConvolveAlgorithms(&algorithms)

The error is also reported in tensorflow/tensorflow#6698 and keras-team/keras#8353. One workaround for this problem is to add the following to drive.py:

import tensorflow as tf
from keras import backend as K
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
K.set_session(sess)

This is also reported in #29 but that thread does not mention the root of the problem.

mvirgo commented 6 years ago

Hello - this project only support Tensorflow v0.12.1 as included in the starter kit at this time.

mvirgo commented 6 years ago

Closing out since no further comments.

udacity / CarND-Behavioral-Cloning-P3

drive.py crashes when running both keras model and simulator on local GPU #31