open-mmlab / mmcv

OpenMMLab Computer Vision Foundation
https://mmcv.readthedocs.io/en/latest/
Apache License 2.0
5.84k stars 1.63k forks source link

[Bug] getting RuntimeError: CUDA error: initialization error #3073

Open Yash-099 opened 6 months ago

Yash-099 commented 6 months ago

Prerequisite

Environment

when running python .dev_scripts/check_installation.py i am getting below error

Traceback (most recent call last): File ".dev_scripts/check_installation.py", line 37, in <module> check_installation() File ".dev_scripts/check_installation.py", line 29, in check_installation box_iou_rotated(boxes1, boxes2) File "/home/msarkar/shripad/yash/mmcv/mmcv/ops/box_iou_rotated.py", line 152, in box_iou_rotated ext_module.box_iou_rotated( RuntimeError: CUDA error: initialization error

Have installed mmcv following step here - https://mmcv.readthedocs.io/en/latest/get_started/build.html#build-mmcv-from-source have Driver Version: 418.87.00 CUDA Version: 10.1

Reproduces the problem - code sample

sdcsadc

Reproduces the problem - command or script

python .dev_scripts/check_installation.py

after successfully running step from https://mmcv.readthedocs.io/en/latest/get_started/build.html#build-mmcv-from-source for building mmcv for gpu

Reproduces the problem - error message

Traceback (most recent call last): File ".dev_scripts/check_installation.py", line 37, in <module> check_installation() File ".dev_scripts/check_installation.py", line 29, in check_installation box_iou_rotated(boxes1, boxes2) File "/home/msarkar/shripad/yash/mmcv/mmcv/ops/box_iou_rotated.py", line 152, in box_iou_rotated ext_module.box_iou_rotated( RuntimeError: CUDA error: initialization error

Additional information

No response

Yash-099 commented 6 months ago

I am getting the same error when I run the below code

import torch 
from mmengine.config import Config, DictAction
from mmdet.registry import MODELS
from PIL import Image
import numpy as np
from mmdet.structures import DetDataSample

checkpoint = torch.load('target/models/dinov2_cgp_tab_list/model.pth')
print(checkpoint.keys())
model_state_dict = checkpoint['meta']

cfg_own = Config.fromfile('flamingo/networks/mmdet_config/mmdet_dinov2/own_config.py')

model = MODELS.build(cfg_own.model)

model.load_state_dict(checkpoint['state_dict'])

model = model.to('cuda')
print('printing the model', model)

model.eval()

image_path = 'train/images/images/p3_Forms_PDFs_IRS_Form13803.2.png'
image = Image.open(image_path)
orig_img = np.array(image.convert('RGB'))
orig_img = np.transpose(orig_img, (2, 0, 1))

image_shape = (len(orig_img[0]), len(orig_img[0][0]))
image = np.expand_dims(orig_img, axis=0)
image = torch.from_numpy(np.asarray(image)).float().cuda()

data_sample = [DetDataSample()]
data_sample[0].batch_input_shape = (1,3)
data_sample[0].img_shape = image_shape
print(data_sample[0].batch_input_shape)
with torch.no_grad():
    output = model(image, data_sample)

print(output)

And I am able to run a normal CNN model on GPU successfully, so doesn't seem like the cuda is not properly set up on the machine