pierluigiferrari / ssd_keras

A Keras port of Single Shot MultiBox Detector
Apache License 2.0
1.86k stars 935 forks source link

InternalError #308

Closed Baominchao closed 4 years ago

Baominchao commented 4 years ago

Please help me When i run ssd300_training.ipynb, the 5. Train cell have error: InternalError Traceback (most recent call last)

in () 10 validation_data=val_generator, 11 validation_steps=ceil(val_dataset_size/batch_size), ---> 12 initial_epoch=initial_epoch) ~\Anaconda3\envs\ssd\lib\site-packages\keras\legacy\interfaces.py in wrapper(*args, **kwargs) 89 warnings.warn('Update your `' + object_name + 90 '` call to the Keras 2 API: ' + signature, stacklevel=2) ---> 91 return func(*args, **kwargs) 92 wrapper._original_function = func 93 return wrapper ~\Anaconda3\envs\ssd\lib\site-packages\keras\engine\training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch) 1413 use_multiprocessing=use_multiprocessing, 1414 shuffle=shuffle, -> 1415 initial_epoch=initial_epoch) 1416 1417 @interfaces.legacy_generator_methods_support ~\Anaconda3\envs\ssd\lib\site-packages\keras\engine\training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch) 211 outs = model.train_on_batch(x, y, 212 sample_weight=sample_weight, --> 213 class_weight=class_weight) 214 215 outs = to_list(outs) ~\Anaconda3\envs\ssd\lib\site-packages\keras\engine\training.py in train_on_batch(self, x, y, sample_weight, class_weight) 1213 ins = x + y + sample_weights 1214 self._make_train_function() -> 1215 outputs = self.train_function(ins) 1216 return unpack_singleton(outputs) 1217 ~\Anaconda3\envs\ssd\lib\site-packages\keras\backend\tensorflow_backend.py in __call__(self, inputs) 2664 return self._legacy_call(inputs) 2665 -> 2666 return self._call(inputs) 2667 else: 2668 if py_any(is_tensor(x) for x in inputs): ~\Anaconda3\envs\ssd\lib\site-packages\keras\backend\tensorflow_backend.py in _call(self, inputs) 2634 symbol_vals, 2635 session) -> 2636 fetched = self._callable_fn(*array_vals) 2637 return fetched[:len(self.outputs)] 2638 ~\Anaconda3\envs\ssd\lib\site-packages\tensorflow\python\client\session.py in __call__(self, *args, **kwargs) 1380 ret = tf_session.TF_SessionRunCallable( 1381 self._session._session, self._handle, args, status, -> 1382 run_metadata_ptr) 1383 if run_metadata: 1384 proto_data = tf_session.TF_GetBuffer(run_metadata_ptr) ~\Anaconda3\envs\ssd\lib\site-packages\tensorflow\python\framework\errors_impl.py in __exit__(self, type_arg, value_arg, traceback_arg) 517 None, None, 518 compat.as_text(c_api.TF_Message(self.status.status)), --> 519 c_api.TF_GetCode(self.status.status)) 520 # Delete the underlying status object from memory otherwise it stays alive 521 # as there is a reference to status from this from the traceback due to InternalError: cuDNN Backward Filter function launch failure : input shape([32,128,150,150]) filter shape([3,3,128,128]) [[Node: training/SGD/gradients/conv2_2/convolution_grad/Conv2DBackpropFilter = Conv2DBackpropFilter[T=DT_FLOAT, _class=["loc:@training/SGD/gradients/conv2_2/convolution_grad/Conv2DBackpropInput"], data_format="NCHW", dilations=[1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1], use_cudnn_on_gpu=true, _device="/job:localhost/replica:0/task:0/device:GPU:0"](conv2_1/Relu, ConstantFolding/training/SGD/gradients/conv2_2/convolution_grad/ShapeN-matshapes-1, training/SGD/gradients/conv2_2/Relu_grad/ReluGrad)]] My tensorflow=1.10.0 python=3.5 keras=2.2.2. I only filled in the path of the data set, but I didn't change anything else.
Baominchao commented 4 years ago

I change the version of tensorflow. Tensorflow=1.13.1, keras=2.2.4, python=3.6. Then, the error is gone.