mrlooi / rotated_maskrcnn

Rotated Mask R-CNN: From Bounding Boxes to Rotated Bounding Boxes
MIT License
347 stars 62 forks source link

Please help me.....THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=8 : invalid device function #44

Closed smilerichpse closed 3 years ago

smilerichpse commented 3 years ago

❓ Questions and Help

When I make a training I met error. Could anyone give a help to solve this problem

My Laptop is Lenovo Legion7 prodcut with RTX3070 I run the rotated_maskcrnn using docker image : ubuntu16.04, CUDA9.0, python 3.6.5 nvidia-docker run -it --shm-size 2G -v /home/anthony/dev/r_maskrcnn:/home smilepse/cuda9.0-ubuntu16.04-py3-torch1.0 bash (refer: docker pull smilepse/cuda9.0-ubuntu16.04-py3-torch1.0:latest ) and my pip list, development evn is in bottom

Training command python tools/train_net.py --config-file "configs/rotated/e2e_ms_rcnn_R_50_FPN_1x.yaml" error msg : 2021-09-03 03:19:09,408 maskrcnn_benchmark.trainer INFO: Start training THCudaCheck FAIL file=/pytorch/aten/src/THC/THCGeneral.cpp line=405 error=8 : invalid device function

Traceback (most recent call last): File "tools/train_net.py", line 196, in <module> main() File "tools/train_net.py", line 189, in main model = train(cfg, args.local_rank, args.distributed) File "tools/train_net.py", line 89, in train arguments, File "/home/rotated_maskrcnn/maskrcnn_benchmark/engine/trainer.py", line 71, in do_train loss_dict = model(images, targets) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/apex-0.1-py3.6-linux-x86_64.egg/apex/amp/_initialize.py", line 197, in new_fwd **applier(kwargs, input_caster)) File "/home/rotated_maskrcnn/maskrcnn_benchmark/modeling/detector/generalized_rcnn.py", line 65, in forward features = self.backbone(images.tensors) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/rotated_maskrcnn/maskrcnn_benchmark/modeling/backbone/resnet.py", line 149, in forward x = getattr(self, stage_name)(x) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/container.py", line 92, in forward input = module(input) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/rotated_maskrcnn/maskrcnn_benchmark/modeling/backbone/resnet.py", line 331, in forward out = self.conv2(out) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 489, in __call__ result = self.forward(*input, **kwargs) File "/home/rotated_maskrcnn/maskrcnn_benchmark/layers/misc.py", line 33, in forward return super(Conv2d, self).forward(x) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/conv.py", line 320, in forward self.padding, self.dilation, self.groups) RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

pip list absl-py 0.3.0 apex 0.1 astor 0.7.1 atari-py 0.1.1 baselines 0.1.5 Box2D 2.3.2 Box2D-kengz 2.3.3 box2d-py 2.3.4 certifi 2018.4.16 cffi 1.11.5 chardet 3.0.4 click 6.7 cloudpickle 0.5.3 cycler 0.10.0 Cython 0.28.4 dill 0.2.8.2 future 0.16.0 gast 0.2.0 glfw 1.7.0 grpcio 1.14.1 gym 0.10.5 idna 2.7 imageio 2.3.0 joblib 0.12.2 kiwisolver 1.0.1 Markdown 2.6.11 maskrcnn-benchmark 0.1 /home/rotated_maskrcnn matplotlib 2.2.2 mpi4py 3.0.0 mujoco-py 1.50.1.56 numpy 1.16.0 opencv-python 3.4.2.17 pandas 0.23.3 Pillow 5.2.0 pip 18.0 progressbar2 3.38.0 protobuf 3.6.0 pycocotools 2.0 pycparser 2.18 pycurl 7.43.0 pyglet 1.3.2 pygobject 3.20.0 PyOpenGL 3.1.0 pyparsing 2.2.0 python-apt 1.1.0b1+ubuntu0.16.4.1 python-dateutil 2.7.3 python-utils 2.3.0 pytz 2018.5 PyYAML 5.4.1 pyzmq 17.1.0 requests 2.19.1 scikit-learn 0.19.2 scipy 1.1.0 setproctitle 1.1.10 setuptools 39.1.0 six 1.11.0 tensorboard 1.10.0 tensorflow 1.10.0 termcolor 1.1.0 tk 0.1.0 torch 1.0.0 torchvision 0.2.1 tqdm 4.24.0 urllib3 1.23 Werkzeug 0.14.1 wheel 0.31.1 yacs 0.1.8 zmq 0.0.0

Host nvidia-smi +-------------------------------------------------------------------------------------------------------------------+ | NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 | |-------------------------------------------------+--------------------------+--------------------------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |======================+==================+=====================| | 0 GeForce RTX 307... Off | 00000000:01:00.0 On | N/A | | N/A 45C P8 18W / N/A | 846MiB / 7982MiB | 1% Default | | | | N/A | +----------------------------------------+----------------------------------+---------------------------------------+ +--------------------------------------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |==============================================================| | 0 N/A N/A 1241 G /usr/lib/xorg/Xorg 26MiB | | 0 N/A N/A 2693 G /usr/bin/gnome-shell 88MiB | | 0 N/A N/A 3281 G /usr/lib/xorg/Xorg 356MiB | | 0 N/A N/A 3422 G /usr/bin/gnome-shell 219MiB | +------------------------------------------------------------------------------------------------------------------+

nvcc nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2017 NVIDIA Corporation Built on Fri_Sep__1_21:08:03_CDT_2017 Cuda compilation tools, release 9.0, V9.0.176

smilerichpse commented 3 years ago

I solved this problem.