RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

Captain-bing commented 5 years ago

when I'm training on my datasets, there is a mistake: RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED. I'm not sure whether it due to my cudnn version or my wrong datasets(my datasets were transformed from yolo style)

Captain-bing commented 5 years ago

this is my config 2019-08-13 15:44:54,205 fcos_core INFO: Using 4 GPUs 2019-08-13 15:44:54,205 fcos_core INFO: Namespace(config_file='configs/fcos/fcos_R_50_FPN_1x.yaml', distributed=True, local_rank=0, opts=['DATALOADER.NUM_WORKERS', '0', 'OUTPUT_DIR', 'training_dir/fcos_R_50_FPN_1x'], skip_test=True) 2019-08-13 15:44:54,205 fcos_core INFO: Collecting env info (might take some time) 2019-08-13 15:44:55,406 fcos_core INFO: PyTorch version: 1.1.0 Is debug build: No CUDA used to build PyTorch: 9.0.176

OS: Ubuntu 16.04.6 LTS GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609 CMake version: version 3.5.1

Python version: 3.6 Is CUDA available: Yes CUDA runtime version: 9.0.176 GPU models and configuration: GPU 0: GeForce RTX 2080 Ti GPU 1: GeForce RTX 2080 Ti GPU 2: GeForce RTX 2080 Ti GPU 3: GeForce RTX 2080 Ti

Nvidia driver version: 418.67 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.2

Versions of relevant libraries: [pip] Could not collect [conda] blas 1.0 mkl https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/free [conda] mkl 2019.4 243 defaults [conda] mkl-service 2.0.2 py36h7b6447c_0 defaults [conda] mkl_fft 1.0.12 py36ha843d7b_0 defaults [conda] mkl_random 1.0.2 py36hd81dba3_0 defaults [conda] pytorch 1.1.0 cuda90py36h8b0c50b_0 defaults [conda] torchvision 0.2.1 py36_0 defaults Pillow (4.2.1) 2019-08-13 15:44:55,406 fcos_core INFO: Loaded configuration file configs/fcos/fcos_R_50_FPN_1x.yaml 2019-08-13 15:44:55,406 fcos_core INFO: MODEL: META_ARCHITECTURE: "GeneralizedRCNN" WEIGHT: "catalog://ImageNetPretrained/MSRA/R-50" RPN_ONLY: True FCOS_ON: True BACKBONE: CONV_BODY: "R-50-FPN-RETINANET" RESNETS: BACKBONE_OUT_CHANNELS: 256 RETINANET: USE_C5: False # FCOS uses P5 instead of C5

tianzhi0549 commented 5 years ago

@Captain-bing Please try to use the original coco dataset to make sure it is not due to your data.

Captain-bing commented 5 years ago

@Captain-bing Please try to use the original coco dataset to make sure it is not due to your data.

okey, thank you! I solved this problem by updating my cuda from 9.0 to 10.0 and reinstall the corresponding cudnn. In fact, my own datasets are in correct coco style.

tianzhi0549 commented 5 years ago

@Captain-bing Happy to know this:-).

tianzhi0549 / FCOS

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED #117