open-mmlab / mmcv

OpenMMLab Computer Vision Foundation
https://mmcv.readthedocs.io/en/latest/
Apache License 2.0
5.81k stars 1.62k forks source link

Training with GPU -- RuntimeError: roi_align_forward_impl: #2850

Open soumyadbanik opened 1 year ago

soumyadbanik commented 1 year ago

Prerequisite

Environment

I'm training the AVA dataset for spatio-temporal activity detection. But it's not taking any gpu while I've 2 gpuspresent in my machine. However it's supposed to take gpu by default but which is not happening in the latest mmcv version. If I enable gpu with the CUDA_VISIBLE_DEVICES=0,1 environment variable, I'm getting this error.

File "/home/soumyadeep/mmaction_custom/mmaction2_v1.0/mmcv/mmcv/ops/roi_align.py", line 90, in forward
    ext_module.roi_align_forward(
RuntimeError: roi_align_forward_impl: implementation for device cuda:0 not found.

image

Reproduces the problem - code sample

/mmcv/mmcv/ops/roi_align.py

Reproduces the problem - command or script

CUDA_VISIBLE_DEVICES=0,1 bash tools/dist_train.sh /home/soumyadeep/mmaction_custom/mmaction2_v1.0/configs/detection/slowfast/slowfast_kinetics400-pretrained-r50_8xb16-4x16x1-20e_ava21-rgb.py 2

Reproduces the problem - error message

File "/home/soumyadeep/miniconda3/envs/openmmlab2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/soumyadeep/mmaction_custom/mmaction2_v1.0/mmaction/models/roi_heads/roi_extractors/single_straight3d.py", line 122, in forward
    roi_feat = self.roi_layer(frame_feat, rois)
  File "/home/soumyadeep/miniconda3/envs/openmmlab2/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/soumyadeep/mmaction_custom/mmaction2_v1.0/mmcv/mmcv/ops/roi_align.py", line 210, in forward
    return roi_align(input, rois, self.output_size, self.spatial_scale,
  File "/home/soumyadeep/miniconda3/envs/openmmlab2/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
    return super().apply(*args, **kwargs)  # type: ignore[misc]
  File "/home/soumyadeep/mmaction_custom/mmaction2_v1.0/mmcv/mmcv/ops/roi_align.py", line 90, in forward
    ext_module.roi_align_forward(
RuntimeError: roi_align_forward_impl: implementation for device cuda:0 not found.

Additional information

No response

zhouzaida commented 1 year ago

Hi @soumyadbanik , it maybe mmcv was not installed with cuda op support.