open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.2k stars 9.39k forks source link

nms_impl: implementation for device mps:0 not found #11437

Open validatedev opened 8 months ago

validatedev commented 8 months ago

Thanks for your error report and we appreciate it a lot.

Checklist

  1. I have searched related issues but cannot get the expected help.
  2. I have read the FAQ documentation but cannot get the expected help.
  3. The bug has not been fixed in the latest version.

Describe the bug When I try to run demo with M2 MacBook Pro, it gives the error RuntimeError: nms_impl: implementation for device mps:0 not found.

Reproduction

  1. What command or script did you run?

I checked the device via torch and device mps is present.

# Check whether cuda, mps or cpu is available
import torch
if torch.cuda.is_available():
    print("cuda is available")
    device = torch.device('cuda')
elif torch.backends.mps.is_available():
    print("mps is available")
    device = torch.device('mps')
else:
    print("cpu is available")
    device = torch.device('cpu')

# Print the device name
print(f"device is {device}")

And the output is

mps is available
device is mps

When I try to run

from mmdet.apis import init_detector, inference_detector

config_file = 'tmp/rtmdet_tiny_8xb32-300e_coco.py'
checkpoint_file = 'tmp/rtmdet_tiny_8xb32-300e_coco_20220902_112414-78e30dcc.pth'
model = init_detector(config_file, checkpoint_file, device=device)  # or device='cuda:0'
inference_detector(model, 'demo/demo.jpg')

it gives the output

RuntimeError: nms_impl: implementation for device mps:0 not found.
  1. Did you make any modifications on the code or config? Did you understand what you have modified? There aren't any modification on mmdet.
  2. What dataset did you use? The dataset downloaded with mim download mmdet --config rtmdet_tiny_8xb32-300e_coco

Environment

  1. Please run python mmdet/utils/collect_env.py to collect necessary environment information and paste it here.
  2. You may add addition that may be helpful for locating the problem, such as
    • How you installed PyTorch [e.g., pip, conda, source]
    • Other environment variables that may be related (such as $PATH, $LD_LIBRARY_PATH, $PYTHONPATH, etc.)

I can't provide the output of python mmdet/utils/collect_env.py as I have installed mmdetection as mim install mmdet but the environment includes:

And output of

from mmengine.utils import get_git_hash
from mmengine.utils.dl_utils import collect_env as collect_base_env

import mmdet

def collect_env():
    """Collect the information of the running environments."""
    env_info = collect_base_env()
    env_info['MMDetection'] = f'{mmdet.__version__}+{get_git_hash()[:7]}'
    return env_info

for name, val in collect_env().items():
    print(f'{name}: {val}')

is

sys.platform: darwin
Python: 3.10.13 (main, Jan 27 2024, 15:54:55) [Clang 15.0.0 (clang-1500.1.0.2.5)]
CUDA available: False
MUSA available: False
numpy_random_seed: 2147483648
GCC: Apple clang version 15.0.0 (clang-1500.1.0.2.5)
PyTorch: 2.1.2
PyTorch compiling details: PyTorch built with:
  - GCC 4.2
  - C++ Version: 201703
  - clang 13.1.6
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - Build settings: BLAS_INFO=accelerate, BUILD_TYPE=Release, CXX_COMPILER=/Applications/Xcode_13.3.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++, CXX_FLAGS= -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DUSE_PYTORCH_METAL_EXPORT -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DUSE_COREML_DELEGATE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=braced-scalar-init -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wvla-extension -Wnewline-eof -Winconsistent-missing-override -Winconsistent-missing-destructor-override -Wno-range-loop-analysis -Wno-pass-failed -Wsuggest-override -Wno-error=pedantic -Wno-error=old-style-cast -Wno-error=inconsistent-missing-override -Wno-error=inconsistent-missing-destructor-override -Wconstant-conversion -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-missing-braces -Wunused-lambda-capture -Qunused-arguments -fcolor-diagnostics -faligned-new -Wno-unused-but-set-variable -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -DUSE_MPS -Wno-unused-private-field -Wno-missing-braces, LAPACK_INFO=accelerate, TORCH_DISABLE_GPU_ASSERTS=OFF, TORCH_VERSION=2.1.2, USE_CUDA=OFF, USE_CUDNN=OFF, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=OFF, USE_ROCM=OFF, 

TorchVision: 0.16.2
OpenCV: 4.9.0
MMEngine: 0.10.3
MMDetection: 3.3.0+

Error traceback The full output is here: https://gist.github.com/validatedev/d505e139b119edfcabcd9a864cb8552a#file-gistfile1-txt

Bug fix Actually, I was about to encounter such an issue in ultralytics/ultralytics and ultralytics/yolov5, and they solved the issue by running non_max_suppression via cpu if the mps device is present and active. I can provide a similar fix in your repository as well :)

gurunathkulkarnivaltech commented 8 months ago

Same issue in python 3.10.6

osmartinez commented 8 months ago

same in python 3.8.18

gurunathkulkarnivaltech commented 8 months ago

The issue is with mac book pro m1 chip as the issue is not there in windows laptop with cpu . The issue which i got when training the cutom dataset.

njho commented 7 months ago

@validatedev Are you able to provide the location for casting the nms to cpu?

Never mind found it at: https://github.com/open-mmlab/mmcv/blob/d9e10e11846d911e8354cd024967d3a17a88083c/mmcv/ops/nms.py#L127

    inds = NMSop.apply(boxes.to('cpu'), scores.to('cpu'), iou_threshold, offset, score_threshold,
                       max_num)