I have compiled successfully. When I evaluate on the Campus datasets, it returns errors cudnn PoolForward launch failed and failed to enqueue forward pooling on stream: CUDNN_STATUS_EXECUTION_FAILED. I have checked all issues and there is no issue like this. I also google the error, and modified cudnn from 7.6.5.32 to 7.0.0.5. My GPU memory is 10G and CPU memory is 15G. Until the error occur, GPU memory up to 2G and CPU memory up to 6G. So I'm sure it's not out of memory question.
Errors
In order to view easy, I have bolded the key information. And bellow are details.
python ./src/m_utils/demo.py -d Campus
/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.metrics.base module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.metrics. Anything that cannot be imported from sklearn.metrics is now part of the private API.
warnings.warn(message, FutureWarning)
2022-04-01 16:59:23.375152: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2022-04-01 16:59:23.431099: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:897] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-04-01 16:59:23.431203: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1392] Found device 0 with properties:
name: NVIDIA GeForce RTX 3080 major: 8 minor: 6 memoryClockRate(GHz): 1.71
pciBusID: 0000:01:00.0
totalMemory: 9.78GiB freeMemory: 8.79GiB
2022-04-01 16:59:23.431216: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2022-04-01 17:08:18.636708: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-04-01 17:08:18.636729: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2022-04-01 17:08:18.636734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2022-04-01 17:08:18.636808: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8331 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6)
2022-04-01 17:08:24.022598: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1471] Adding visible gpu devices: 0
2022-04-01 17:08:24.022628: I tensorflow/core/common_runtime/gpu/gpu_device.cc:952] Device interconnect StreamExecutor with strength 1 edge matrix:
2022-04-01 17:08:24.022635: I tensorflow/core/common_runtime/gpu/gpu_device.cc:958] 0
2022-04-01 17:08:24.022639: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: N
2022-04-01 17:08:24.022683: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1084] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8331 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080, pci bus id: 0000:01:00.0, compute capability: 8.6)
04-01 17:08:24 Generating testing graph on 1 GPUs ...
04-01 17:08:26 Initialized model weights from /home/dreamdeck/Documents/MJJ/code/PoseEstimation/mvpose/backend/tf_cpn/log/model_dump/snapshot_350.ckpt ...
04-01 17:08:29 Current epoch is 350.
/home/dreamdeck/Documents/MJJ/code/PoseEstimation/mvpose/backend/CamStyle/reid/models/resnet.py:49: UserWarning: nn.init.kaiming_normal is now deprecated in favor of nn.init.kaimingnormal.
init.kaiming_normal(self.feat.weight, mode='fanout')
/home/dreamdeck/Documents/MJJ/code/PoseEstimation/mvpose/backend/CamStyle/reid/models/resnet.py:50: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant.
init.constant(self.feat.bias, 0)
/home/dreamdeck/Documents/MJJ/code/PoseEstimation/mvpose/backend/CamStyle/reid/models/resnet.py:51: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
init.constant(self.featbn.weight, 1)
/home/dreamdeck/Documents/MJJ/code/PoseEstimation/mvpose/backend/CamStyle/reid/models/resnet.py:52: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant.
init.constant(self.featbn.bias, 0)
/home/dreamdeck/Documents/MJJ/code/PoseEstimation/mvpose/backend/CamStyle/reid/models/resnet.py:60: UserWarning: nn.init.normal is now deprecated in favor of nn.init.normal.
init.normal(self.classifier.weight, std=0.001)
/home/dreamdeck/Documents/MJJ/code/PoseEstimation/mvpose/backend/CamStyle/reid/models/resnet.py:61: UserWarning: nn.init.constant is now deprecated in favor of nn.init.constant_.
init.constant(self.classifier.bias, 0)
=> Loaded checkpoint '/home/dreamdeck/Documents/MJJ/code/PoseEstimation/mvpose/backend/CamStyle/logs/market-ide-camstyle-re/checkpoint.pth.tar'
=> Start epoch 50
0%| | 0/79 [00:00<?, ?it/s]2022-04-01 17:09:31.471629: E tensorflow/stream_executor/cuda/cuda_dnn.cc:3900] failed to enqueue forward pooling on stream: CUDNN_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1322, in _do_call
return fn(args)
File "/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1307, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1409, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.InternalError: cudnn PoolForward launch failed
[[Node: light_resnet_v1_101/pool1/MaxPool = MaxPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 3, 3], padding="SAME", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
[[Node: add/_1107 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2576_add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./src/m_utils/demo.py", line 92, in
pose_in_range = export ( test_model, test_loader, is_info_dicts=bool ( args.dumped_dir ), show=True )
File "./src/m_utils/demo.py", line 44, in export
show=show, plt_id=img_id )
File "/home/dreamdeck/Documents/MJJ/code/PoseEstimation/mvpose/src/models/estimate3d.py", line 40, in predict
info_dict = self._infer_single2d ( imgs )
File "/home/dreamdeck/Documents/MJJ/code/PoseEstimation/mvpose/src/models/estimate3d.py", line 48, in _infer_single2d
results = self.est2d.estimate_2d ( img, img_id )
File "/home/dreamdeck/Documents/MJJ/code/PoseEstimation/mvpose/backend/estimator_2d.py", line 23, in estimate_2d
bbox_result = self.bbox_detector.detect ( img, img_id )
File "/home/dreamdeck/Documents/MJJ/code/PoseEstimation/mvpose/backend/light_head_rcnn/person_detector.py", line 60, in detect
result_dict = self.inference ( self.func, self.inputs, data_dict )
File "/home/dreamdeck/Documents/MJJ/code/PoseEstimation/mvpose/backend/light_head_rcnn/persondetector.py", line 152, in inference
, scores, pred_boxes, rois = val_func ( feed_dict=feed_dict )
File "/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1135, in _run
feed_dict_tensor, options, run_metadata)
File "/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1316, in _do_run
run_metadata)
File "/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InternalError: cudnn PoolForward launch failed
[[Node: light_resnet_v1_101/pool1/MaxPool = MaxPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 3, 3], padding="SAME", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
[[Node: add/_1107 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2576_add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Caused by op 'light_resnet_v1_101/pool1/MaxPool', defined at:
File "./src/m_utils/demo.py", line 59, in
test_model = MultiEstimator ( cfg=model_cfg )
File "/home/dreamdeck/Documents/MJJ/code/PoseEstimation/mvpose/src/models/estimate3d.py", line 34, in init
self.est2d = Estimator_2d ( DEBUGGING=debug )
File "/home/dreamdeck/Documents/MJJ/code/PoseEstimation/mvpose/backend/estimator_2d.py", line 19, in init
self.bbox_detector = PersonDetector ( show_image=DEBUGGING )
File "/home/dreamdeck/Documents/MJJ/code/PoseEstimation/mvpose/backend/light_head_rcnn/person_detector.py", line 50, in init
self.func, self.inputs = self._load_model ( self.model_file )
File "/home/dreamdeck/Documents/MJJ/code/PoseEstimation/mvpose/backend/light_head_rcnn/person_detector.py", line 122, in _load_model
net.inference ( 'TEST', inputs )
File "/home/dreamdeck/Documents/MJJ/code/PoseEstimation/mvpose/backend/light_head_rcnn/network_desp.py", line 109, in inference
net, [3, 3], stride=2, padding='SAME', scope='pool1')
File "/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/tensorflow/contrib/framework/python/ops/arg_scope.py", line 183, in func_with_args
return func(args, current_args)
File "/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/tensorflow/contrib/layers/python/layers/layers.py", line 2404, in max_pool2d
outputs = layer.apply(inputs)
File "/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 774, in apply
return self.call(inputs, *args, *kwargs)
File "/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/tensorflow/python/layers/base.py", line 329, in call
outputs = super(Layer, self).call(inputs, args, kwargs)
File "/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 703, in call
outputs = self.call(inputs, *args, kwargs)
File "/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/tensorflow/python/keras/layers/pooling.py", line 223, in call
data_format=conv_utils.convert_data_format(self.data_format, 4))
File "/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/tensorflow/python/ops/nn_ops.py", line 2153, in max_pool
name=name)
File "/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 4640, in max_pool
data_format=data_format, name=name)
File "/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3414, in create_op
op_def=op_def)
File "/home/dreamdeck/anaconda3/envs/mvpose/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1740, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
InternalError (see above for traceback): cudnn PoolForward launch failed**
[[Node: light_resnet_v1_101/pool1/MaxPool = MaxPoolT=DT_FLOAT, data_format="NCHW", ksize=[1, 1, 3, 3], padding="SAME", strides=[1, 1, 2, 2], _device="/job:localhost/replica:0/task:0/device:GPU:0"]]
[[Node: add/_1107 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device_incarnation=1, tensor_name="edge_2576_add", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
Env
Description
I have compiled successfully. When I evaluate on the Campus datasets, it returns errors
cudnn PoolForward launch failed
andfailed to enqueue forward pooling on stream: CUDNN_STATUS_EXECUTION_FAILED
. I have checked all issues and there is no issue like this. I also google the error, and modified cudnn from 7.6.5.32 to 7.0.0.5. My GPU memory is 10G and CPU memory is 15G. Until the error occur, GPU memory up to 2G and CPU memory up to 6G. So I'm sure it's not out of memory question.Errors
In order to view easy, I have bolded the key information. And bellow are details.