sampepose / flownet2-tf

FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks
MIT License
403 stars 195 forks source link

Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR #110

Closed FatihDemirtas closed 4 years ago

FatihDemirtas commented 4 years ago

Ubuntu: 18.04 image

When I run test code:

_python -m src.flownet2.test --input_a data/samples/0img0.ppm --inputb data/samples/0img1.ppm --out ./

I got error below:

/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:526: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint8 = np.dtype([("qint8", np.int8, 1)]) /home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:527: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint8 = np.dtype([("quint8", np.uint8, 1)]) /home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:528: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint16 = np.dtype([("qint16", np.int16, 1)]) /home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:529: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_quint16 = np.dtype([("quint16", np.uint16, 1)]) /home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:530: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. _np_qint32 = np.dtype([("qint32", np.int32, 1)]) /home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:535: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'. np_resource = np.dtype([("resource", np.ubyte, 1)])

WARNING: The TensorFlow contrib module will not be included in TensorFlow 2.0. For more information, please see:

WARNING:tensorflow:From /home/fatih/Projects/flownet2-tf/src/net.py:22: get_or_create_global_step (from tensorflow.contrib.framework.python.ops.variables) is deprecated and will be removed in a future version. Instructions for updating: Please switch to tf.train.get_or_create_global_step WARNING:tensorflow:From /home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. WARNING:tensorflow:From /home/fatih/Projects/flownet2-tf/src/flownet_cs/flownet_cs.py:26: calling reduce_sum_v1 (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead 2020-04-25 13:06:36.054001: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA 2020-04-25 13:06:36.076213: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3600000000 Hz 2020-04-25 13:06:36.076595: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5607ef354d10 executing computations on platform Host. Devices: 2020-04-25 13:06:36.076609: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): , 2020-04-25 13:06:36.151681: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2020-04-25 13:06:36.151948: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: Quadro RTX 4000 major: 7 minor: 5 memoryClockRate(GHz): 1.545 pciBusID: 0000:01:00.0 totalMemory: 7.79GiB freeMemory: 7.00GiB 2020-04-25 13:06:36.151962: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2020-04-25 13:06:36.152938: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2020-04-25 13:06:36.152949: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 2020-04-25 13:06:36.152954: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N 2020-04-25 13:06:36.153006: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 6812 MB memory) -> physical GPU (device: 0, name: Quadro RTX 4000, pci bus id: 0000:01:00.0, compute capability: 7.5) 2020-04-25 13:06:36.154077: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x5607eca67810 executing computations on platform CUDA. Devices: 2020-04-25 13:06:36.154089: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): Quadro RTX 4000, Compute Capability 7.5 WARNING:tensorflow:From /home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/training/saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. 2020-04-25 13:06:37.946160: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-25 13:06:37.947466: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-25 13:06:37.948473: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-25 13:06:37.949028: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-25 13:06:37.949606: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2020-04-25 13:06:37.950156: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR Traceback (most recent call last): File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call return fn(*args) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[{{node FlowNet2/FlowNetSD/conv0/Conv2D}}]] [[{{node FlowNet2/ResizeBilinear}}]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/fatih/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/fatih/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/fatih/Projects/flownet2-tf/src/flownet2/test.py", line 51, in main() File "/home/fatih/Projects/flownet2-tf/src/flownet2/test.py", line 18, in main out_path=FLAGS.out, File "/home/fatih/Projects/flownet2-tf/src/net.py", line 69, in test pred_flow = sess.run(pred_flow)[0, :, :, :] File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 929, in run run_metadata_ptr) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run run_metadata) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node FlowNet2/FlowNetSD/conv0/Conv2D (defined at /home/fatih/Projects/flownet2-tf/src/flownet_sd/flownet_sd.py:29) ]] [[node FlowNet2/ResizeBilinear (defined at /home/fatih/Projects/flownet2-tf/src/flownet2/flownet2.py:101) ]]

Caused by op 'FlowNet2/FlowNetSD/conv0/Conv2D', defined at: File "/home/fatih/anaconda3/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/fatih/anaconda3/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/fatih/Projects/flownet2-tf/src/flownet2/test.py", line 51, in main() File "/home/fatih/Projects/flownet2-tf/src/flownet2/test.py", line 18, in main out_path=FLAGS.out, File "/home/fatih/Projects/flownet2-tf/src/net.py", line 62, in test predictions = self.model(inputs, training_schedule) File "/home/fatih/Projects/flownet2-tf/src/flownet2/flownet2.py", line 23, in model net_sd_predictions = self.net_sd.model(inputs, training_schedule, trainable=False) File "/home/fatih/Projects/flownet2-tf/src/flownet_sd/flownet_sd.py", line 29, in model conv0 = slim.conv2d(pad(concat_inputs), 64, 3, scope='conv0') File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args return func(*args, current_args) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1155, in convolution2d conv_dims=2) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 182, in func_with_args return func(*args, *current_args) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 1058, in convolution outputs = layer.apply(inputs) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1227, in apply return self.call(inputs, args, kwargs) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/layers/base.py", line 530, in call outputs = super(Layer, self).call(inputs, *args, kwargs) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 554, in call outputs = self.call(inputs, *args, *kwargs) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/keras/layers/convolutional.py", line 194, in call outputs = self._convolution_op(inputs, self.kernel) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 966, in call return self.conv_op(inp, filter) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 591, in call return self.call(inp, filter) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 208, in call name=self.name) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1026, in conv2d data_format=data_format, dilations=dilations, name=name) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func return func(args, kwargs) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3300, in create_op op_def=op_def) File "/home/fatih/.local/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 1801, in init self._traceback = tf_stack.extract_stack()

UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. [[node FlowNet2/FlowNetSD/conv0/Conv2D (defined at /home/fatih/Projects/flownet2-tf/src/flownet_sd/flownet_sd.py:29) ]] [[node FlowNet2/ResizeBilinear (defined at /home/fatih/Projects/flownet2-tf/src/flownet2/flownet2.py:101)

Please help!

FatihDemirtas commented 4 years ago

If you get an error like "Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR while running flownet2-tf,

CUDA can be replaced by "cuda_10.1.105_418.39_linux.run". CU-DNN should be replaced by "cudnn-10.1-linux-x64-v7.5.1.10.tgz".