open-mmlab / mmrotate

OpenMMLab Rotated Object Detection Toolbox and Benchmark
https://mmrotate.readthedocs.io/en/latest/
Apache License 2.0
1.84k stars 541 forks source link

[Bug] RuntimeError: CUDA error: out of memory Exception raised from ROIAlignRotatedForwardCUDAKernelLauncher #957

Open 2597883929 opened 10 months ago

2597883929 commented 10 months ago

Prerequisite

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

1.x branch https://github.com/open-mmlab/mmrotate/tree/1.x

Environment

sys.platform: linux Python: 3.8.18 (default, Sep 11 2023, 13:40:15) [GCC 11.2.0] CUDA available: True numpy_random_seed: 2147483648 GPU 0,1,2,3,4,5,6,7: GeForce RTX 3090 CUDA_HOME: /usr/local/cuda NVCC: Cuda compilation tools, release 11.0, V11.0.194 GCC: gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 PyTorch: 1.12.1 PyTorch compiling details: PyTorch built with:

TorchVision: 0.13.1 OpenCV: 4.8.1 MMEngine: 0.9.0 MMRotate: 1.0.0rc1+fd60bef

Reproduces the problem - code sample

I just run the demo in the tutorial.

Copyright (c) OpenMMLab. All rights reserved.

from argparse import ArgumentParser

import mmcv from mmdet.apis import inference_detector, init_detector import torch from mmrotate.registry import VISUALIZERS from mmrotate.utils import register_all_modules import os

def parse_args(): parser = ArgumentParser() parser.add_argument('img', help='Image file') parser.add_argument('config', help='Config file') parser.add_argument('checkpoint', help='Checkpoint file') parser.add_argument('--out-file', default=None, help='Path to output file') parser.add_argument( '--device', default='cuda:6', help='Device used for inference') parser.add_argument( '--palette', default='dota', choices=['dota', 'sar', 'hrsc', 'random'], help='Color palette used for visualization') parser.add_argument( '--score-thr', type=float, default=0.3, help='bbox score threshold') args = parser.parse_args() return args

def main(args):

register all modules in mmrotate into the registries

register_all_modules()

# build the model from a config file and a checkpoint file
model = init_detector(
    args.config, args.checkpoint, palette=args.palette, device=args.device)

# init visualizer
visualizer = VISUALIZERS.build(model.cfg.visualizer)
# the dataset_meta is loaded from the checkpoint and
# then pass to the model in init_detector
visualizer.dataset_meta = model.dataset_meta

# test a single image
result = inference_detector(model, args.img)

# show the results
img = mmcv.imread(args.img)
img = mmcv.imconvert(img, 'bgr', 'rgb')
visualizer.add_datasample(
    'result',
    img,
    data_sample=result,
    draw_gt=False,
    show=args.out_file is None,
    wait_time=0,
    out_file=args.out_file,
    pred_score_thr=args.score_thr)

if name == 'main': os.environ['CUDA_LAUNCH_BLOCKING']='1' torch.cuda._initialized = True args = parse_args() main(args)

Reproduces the problem - command or script

python demo/image_demo.py demo/demo.jpg oriented-rcnn-le90_r50_fpn_1x_dota.py oriented_rcnn_r50_fpn_1x_dota_le90-6d2b2ce0.pth --out-file result.jpg

Reproduces the problem - error message

Loads checkpoint by local backend from path: oriented_rcnn_r50_fpn_1x_dota_le90-6d2b2ce0.pth /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmengine/visualization/visualizer.py:196: UserWarning: Failed to add <class 'mmengine.visualization.vis_backend.LocalVisBackend'>, please provide the save_dir argument. warnings.warn(f'Failed to add {vis_backend.class}, ' Traceback (most recent call last): File "demo/image_demo.py", line 66, in main(args) File "demo/image_demo.py", line 46, in main result = inference_detector(model, args.img) File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmdet/apis/inference.py", line 189, in inference_detector results = model.teststep(data)[0] File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 145, in test_step return self._run_forward(data, mode='predict') # type: ignore File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmengine/model/base_model/base_model.py", line 346, in _run_forward results = self(data, mode=mode) File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmdet/models/detectors/base.py", line 94, in forward return self.predict(inputs, data_samples) File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmdet/models/detectors/two_stage.py", line 238, in predict results_list = self.roi_head.predict( File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmdet/models/roi_heads/base_roi_head.py", line 118, in predict results_list = self.predict_bbox( File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmdet/models/roi_heads/standard_roi_head.py", line 335, in predict_bbox bbox_results = self._bbox_forward(x, rois) File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmdet/models/roi_heads/standard_roi_head.py", line 163, in _bbox_forward bbox_feats = self.bbox_roi_extractor( File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/data/xzf/model/mmrotate/mmrotate/models/roi_heads/roi_extractors/rotate_single_level_roi_extractor.py", line 128, in forward roi_feats_t = self.roilayers[i](feats[i], rois) File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/ops/roi_align_rotated.py", line 175, in forward return RoIAlignRotatedFunction.apply(input, rois, self.output_size, File "/data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/ops/roi_align_rotated.py", line 65, in forward ext_module.roi_align_rotated_forward( RuntimeError: CUDA error: out of memory Exception raised from ROIAlignRotatedForwardCUDAKernelLauncher at /tmp/mmcv/mmcv/ops/csrc/pytorch/cuda/roi_align_rotated_cuda.cu:24 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7ff7ee2db497 in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: c10::CUDAError::Error(c10::SourceLocation, std::string) + 0x30 (0x7ff7ad253dac in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so) frame #2: ROIAlignRotatedForwardCUDAKernelLauncher(at::Tensor, at::Tensor, float, int, bool, bool, int, int, int, int, int, int, at::Tensor) + 0x1a8 (0x7ff7ad34065e in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so) frame #3: roi_align_rotated_forward_cuda(at::Tensor, at::Tensor, at::Tensor, int, int, float, int, bool, bool) + 0x228 (0x7ff7ad296798 in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so) frame #4: auto Dispatch<DeviceRegistry<void ()(at::Tensor, at::Tensor, at::Tensor, int, int, float, int, bool, bool), &(roi_align_rotated_forward_impl(at::Tensor, at::Tensor, at::Tensor, int, int, float, int, bool, bool))>, at::Tensor&, at::Tensor&, at::Tensor&, int&, int&, float&, int&, bool&, bool&>(DeviceRegistry<void ()(at::Tensor, at::Tensor, at::Tensor, int, int, float, int, bool, bool), &(roi_align_rotated_forward_impl(at::Tensor, at::Tensor, at::Tensor, int, int, float, int, bool, bool))> const&, char const, at::Tensor&, at::Tensor&, at::Tensor&, int&, int&, float&, int&, bool&, bool&) + 0x11e (0x7ff7ad4a933e in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so) frame #5: roi_align_rotated_forward_impl(at::Tensor, at::Tensor, at::Tensor, int, int, float, int, bool, bool) + 0x8e (0x7ff7ad4a8dee in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so) frame #6: roi_align_rotated_forward(at::Tensor, at::Tensor, at::Tensor, int, int, float, int, bool, bool) + 0x7a (0x7ff7ad4a8eaa in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so) frame #7: + 0x378af5 (0x7ff7ad4a1af5 in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so) frame #8: + 0x35a7f1 (0x7ff7ad4837f1 in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/mmcv/_ext.cpython-38-x86_64-linux-gnu.so) frame #15: THPFunction_apply(_object, _object) + 0x5d6 (0x7ff82e206a06 in /data/xzf/env/mmrotate_1.x/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #20: python() [0x4f5154] frame #26: python() [0x5ab487] frame #31: python() [0x4f5154] frame #37: python() [0x5ab487] frame #40: python() [0x4f4ff6] frame #43: python() [0x4f50db] frame #46: python() [0x4f50db] frame #49: python() [0x4f50db] frame #53: python() [0x4f5154] frame #60: python() [0x5ab487]

Additional information

No response

July-1024 commented 1 month ago

Hi,I meet the same problem.Did you solve it?

2597883929 commented 1 month ago

Hi,I meet the same problem.Did you solve it?

I dont know why this works. But I changed my pytorch to 1.13.1 and my cuda to 11.6, and then it works. Maybe you can try it too