(Question) AMD ROCm support? #581

Open MooN-tm opened 1 year ago

MooN-tm commented 1 year ago

Hi, still very new to all of this and after like 2 months of trying to make use of my 6900 XT and realising that AMD support is not that great across the board of ML/DL, just trying to figure out what actually works and what doesn't.

Have working ROCm and PyTorch installation, but not sure if it's supposed to work with conda. I mean, selecting PyTorch and ROCm on PyTorch website says that conda is not available, so I am wondering if it's okay to pip-install it inside of the conda env.

And while in terminal I get torch.cuda.is_available() = True , in VS Code Jupyter Notebook of the openmmlab conda env I get False so I am wondering - am I doing something wrong or am I just wasting my time and it just doesn't work?


Ubuntu 22.04.2 LTS
Name:                    gfx1030                            
Marketing Name:          AMD Radeon RX 6900 XT   
user@NZXT-H1-Ub: ~$ python3
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
>>> torch.cuda.is_available()
(openmmlab) user@NZXT-H1-Ub: ~/mmyolo$ python3
Python 3.8.16 (default, Jan 17 2023, 23:13:24) 
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
>>> torch.cuda.is_available()
File ~/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmdet/apis/, in init_detector(config, checkpoint, palette, device, cfg_options)
     76         model.dataset_meta = {
     77             'classes': get_classes('coco'),
     78             'palette': palette
     79         }
     81 model.cfg = config  # save the config in the model for convenience
---> 82
     83 model.eval()
     84 return model

File ~/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/base_model/, in, *args, **kwargs)
    200 if device is not None:
    201     self._set_device(torch.device(device))
--> 202 return super().to(*args, **kwargs)

File ~/miniconda3/envs/openmmlab/lib/python3.8/site-packages/torch/nn/modules/, in, *args, **kwargs)
    985         return, dtype if t.is_floating_point() or t.is_complex() else None,
    222 if _cudart is None:
    223     raise AssertionError(
    224         "libcudart functions unavailable. It looks like you have a broken build?")

AssertionError: Torch not compiled with CUDA enabled

Thank you.

hhaAndroid commented 1 year ago

@MooN-tm Why are the two python environments different? One is python3.8 and one is python3.10

MooN-tm commented 1 year ago

I'm not sure about the terminology still as I am quite new to any of this, but 3.10 is system/root/native(?) and the 3.8 is the one in conda (following the installation guide),

tested torch.cuda.is_available() in both and in terminal and got True from both, however when I try to use VS-Code to run the test, the CPU one will pass, but when device='cuda:0' I get the error AssertionError: Torch not compiled with CUDA enabled even though AMD's HiPify should have taken care of it and translate the AMD to nVidia's CUDA.


Loads checkpoint by local backend from path: [/home/moon_tm/mmyolo/yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth](
RuntimeError                              Traceback (most recent call last)
Cell In[5], line 9
      7 checkpoint_file = '/home/moon_tm/mmyolo/yolov5_s-v61_syncbn_fast_8xb16-300e_coco_20220918_084700-86e02187.pth'
      8 model = init_detector(config_file, checkpoint_file, device='cuda:0')  # or device='cuda:0'
----> 9 inference_detector(model, '/home/moon_tm/mmyolo/demo/demo.jpg')

File ~/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmdet/apis/, in inference_detector(model, imgs, test_pipeline)
    148     # forward the model
    149     with torch.no_grad():
--> 150         results = model.test_step(data_)[0]
    152     result_list.append(results)
    154 if not is_batch:

File ~/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/base_model/, in BaseModel.test_step(self, data)
    136 """``BaseModel`` implements ``test_step`` the same as ``val_step``.
    138 Args:
    142     list: The predictions of given data.
    143 """
    144 data = self.data_preprocessor(data, False)
--> 145 return self._run_forward(data, mode='predict')

File ~/miniconda3/envs/openmmlab/lib/python3.8/site-packages/mmengine/model/base_model/, in BaseModel._run_forward(self, data, mode)
     30 if max_num > 0:
     31     inds = inds[:max_num]

RuntimeError: nms_impl: implementation for device cuda:0 not found.