open-mmlab / mmcv

OpenMMLab Computer Vision Foundation
https://mmcv.readthedocs.io/en/latest/
Apache License 2.0
5.85k stars 1.63k forks source link

Unsupported mmcv ops on NPU devices #2985

Open Balabala-Hong opened 11 months ago

Balabala-Hong commented 11 months ago

Prerequisite

Environment

OrderedDict([('sys.platform', 'linux'), ('Python', '3.7.5 (default, Aug 3 2023, 10:32:29) [GCC 7.5.0]'), ('CUDA available', False), ('numpy_random_seed', 2147483648), ('GCC', 'gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0'), ('PyTorch', '1.8.0a0+56b43f4'), ('PyTorch compiling details', 'PyTorch built with:\n - GCC 7.3\n - C++ Version: 201402\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - NNPACK is enabled\n - CPU capability usage: NO AVX\n - Build settings: BLAS_INFO=generic, BUILD_TYPE=Release, CXX_COMPILER=/opt/buildtools/gcc-7.3.0/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DMISSING_ARM_VST1 -DMISSING_ARM_VLD1 -Wno-stringop-overflow, LAPACK_INFO=generic, TORCH_VERSION=1.8.1, USE_CUDA=OFF, USE_CUDNN=OFF, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, \n'), ('TorchVision', '0.9.1'), ('OpenCV', '4.5.5'), ('MMEngine', '0.9.1'), ('MMCV', '2.0.0'), ('MMCV Compiler', 'GCC 7.5'), ('MMCV CUDA Compiler', 'not available')])

Additional infomation:

Reproduces the problem - code sample

from argparse import ArgumentParser

import mmcv from mmdet.apis import inference_detector, init_detector

from mmrotate.registry import VISUALIZERS from mmrotate.utils import register_all_modules

import torch if torch.version >= '1.8': import torch_npu from torch_npu.npu import amp import transfer_to_npu

def parse_args(): parser = ArgumentParser() parser.add_argument('--img', default='demo/demo.jpg', help='Image file') parser.add_argument('--config', default='configs/h2rbox/h2rbox-le90_r50_fpn_adamw-1x_dota-ms.py', # 'demo/oriented-rcnn-le90_r50_fpn_1x_dota.py', help='Config file') parser.add_argument('--checkpoint', default='demo/h2rbox-le90_r50_fpn_adamw-1x_dota-ms-30dcdc68.pth', # 'demo/oriented_rcnn_r50_fpn_1x_dota_le90-6d2b2ce0.pth', help='Checkpoint file') parser.add_argument('--out-file', default='demo/result.jpg', help='Path to output file') parser.add_argument('--device', default='cuda:0', help='Device used for inference') parser.add_argument('--palette', default='dota', choices=['dota', 'sar', 'hrsc', 'random'], help='Color palette used for visualization') parser.add_argument('--score-thr', type=float, default=0.3, help='bbox score threshold') args = parser.parse_args() return args

def main(args):

register all modules in mmrotate into the registries

register_all_modules()

# build the model from a config file and a checkpoint file
model = init_detector(
    args.config, args.checkpoint, palette=args.palette, device=args.device)

# init visualizer
visualizer = VISUALIZERS.build(model.cfg.visualizer)
# the dataset_meta is loaded from the checkpoint and
# then pass to the model in init_detector
visualizer.dataset_meta = model.dataset_meta

# test a single image
result = inference_detector(model, args.img)

# show the results
img = mmcv.imread(args.img)
img = mmcv.imconvert(img, 'bgr', 'rgb')
visualizer.add_datasample(
    'result',
    img,
    data_sample=result,
    draw_gt=False,
    show=args.out_file is None,
    wait_time=0,
    out_file=args.out_file,
    pred_score_thr=args.score_thr)

if name == 'main': args = parse_args() main(args)

Reproduces the problem - command or script

input: cd mmrotate && wget https://download.openmmlab.com/mmrotate/v0.1.0/gliding_vertex/gliding_vertex_r50_fpn_1x_dota_le90/gliding_vertex_r50_fpn_1x_dota_le90-12e7423c.pth && python demo/image_demo.py --config=configs/gliding_vertex/gliding-vertex-rbox_r50_fpn_1x_dota.py --checkpoint=gliding_vertex_r50_fpn_1x_dota_le90-12e7423c.pth error:
Traceback (most recent call last): File "demo/image_demo.py", line 72, in main(args) File "demo/image_demo.py", line 54, in main result = inference_detector(model, args.img) File "/home/HwHiAiUser/Ascend_mm/mmdetection/mmdet/apis/inference.py", line 177, in inference_detector results = model.teststep(data)[0] File "/usr/local/python3.7.5/lib/python3.7/site-packages/mmengine/model/base_model/base_model.py", line 145, in test_step return self._run_forward(data, mode='predict') # type: ignore File "/usr/local/python3.7.5/lib/python3.7/site-packages/mmengine/model/base_model/base_model.py", line 346, in _run_forward results = self(data, mode=mode) File "/usr/local/python3.7.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, *kwargs) File "/home/HwHiAiUser/Ascend_mm/mmdetection/mmdet/models/detectors/base.py", line 94, in forward return self.predict(inputs, data_samples) File "/home/HwHiAiUser/Ascend_mm/mmdetection/mmdet/models/detectors/two_stage.py", line 239, in predict x, rpn_results_list, batch_data_samples, rescale=rescale) File "/home/HwHiAiUser/Ascend_mm/mmdetection/mmdet/models/roi_heads/base_roi_head.py", line 123, in predict rescale=bbox_rescale) File "/home/HwHiAiUser/Ascend_mm/mmrotate/mmrotate/models/roi_heads/gv_ratio_roi_head.py", line 157, in predict_bbox bbox_results = self._bbox_forward(x, rois) File "/home/HwHiAiUser/Ascend_mm/mmrotate/mmrotate/models/roi_heads/gv_ratio_roi_head.py", line 71, in _bbox_forward x[:self.bbox_roi_extractor.num_inputs], rois) File "/usr/local/python3.7.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/home/HwHiAiUser/Ascend_mm/mmdetection/mmdet/models/roi_heads/roi_extractors/single_level_roi_extractor.py", line 107, in forward roi_feats_t = self.roilayers[i](feats[i], rois) File "/usr/local/python3.7.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/home/HwHiAiUser/mmcv-2.0.0/mmcv/ops/roi_align.py", line 211, in forward self.sampling_ratio, self.pool_mode, self.aligned) File "/home/HwHiAiUser/mmcv-2.0.0/mmcv/ops/roi_align.py", line 101, in forward aligned=ctx.aligned) RuntimeError: roi_align_forward_impl: implementation for device xla:0 not found. input: cd mmrotate && python demo/image_demo.py --config=configs/r3det/r3det-tiny-oc_r50_fpn_1x_dota.py --checkpoint=r3det_tiny_r50_fpn_1x_dota_oc-c98a616c.pth error:
Traceback (most recent call last): File "demo/image_demo.py", line 72, in main(args) File "demo/image_demo.py", line 54, in main result = inference_detector(model, args.img) File "/home/HwHiAiUser/Ascend_mm/mmdetection/mmdet/apis/inference.py", line 177, in inference_detector results = model.teststep(data)[0] File "/usr/local/python3.7.5/lib/python3.7/site-packages/mmengine/model/base_model/base_model.py", line 145, in test_step return self._run_forward(data, mode='predict') # type: ignore File "/usr/local/python3.7.5/lib/python3.7/site-packages/mmengine/model/base_model/base_model.py", line 346, in _run_forward results = self(
data, mode=mode) File "/usr/local/python3.7.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(
input, *kwargs) File "/home/HwHiAiUser/Ascend_mm/mmdetection/mmdet/models/detectors/base.py", line 94, in forward return self.predict(inputs, data_samples) File "/home/HwHiAiUser/Ascend_mm/mmrotate/mmrotate/models/detectors/refine_single_stage.py", line 153, in predict x_refine = self.bbox_head_refine[i].feature_refine(x, rois) File "/home/HwHiAiUser/Ascend_mm/mmrotate/mmrotate/models/dense_heads/r3_head.py", line 280, in feature_refine return self.feat_refine_module(x, rois) File "/usr/local/python3.7.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, **kwargs) File "/home/HwHiAiUser/Ascend_mm/mmrotate/mmrotate/models/layers/align.py", line 188, in forward 1 / fr_scale) File "/home/HwHiAiUser/mmcv-2.0.0/mmcv/ops/rotated_feature_align.py", line 95, in rotated_feature_align spatial_scale, points) File "/home/HwHiAiUser/mmcv-2.0.0/mmcv/ops/rotated_feature_align.py", line 61, in forward points=points) RuntimeError: rotated_feature_align_forward_impl: implementation for device xla:0 not found.

input: cd mmrotate && wget https://download.openmmlab.com/mmrotate/v0.1.0/oriented_rcnn/oriented_rcnn_r50_fpn_1x_dota_le90/oriented_rcnn_r50_fpn_1x_dota_le90-6d2b2ce0.pth && python demo/image_demo.py --config=configs/oriented_rcnn/oriented-rcnn-le90_r50_fpn_1x_dota.py --checkpoint=oriented_rcnn_r50_fpn_1x_dota_le90-6d2b2ce0.pth error:
Traceback (most recent call last): File "demo/image_demo.py", line 72, in main(args) File "demo/image_demo.py", line 54, in main result = inference_detector(model, args.img) File "/home/HwHiAiUser/Ascend_mm/mmdetection/mmdet/apis/inference.py", line 177, in inference_detector results = model.teststep(data)[0] File "/usr/local/python3.7.5/lib/python3.7/site-packages/mmengine/model/base_model/base_model.py", line 145, in test_step return self._run_forward(data, mode='predict') # type: ignore File "/usr/local/python3.7.5/lib/python3.7/site-packages/mmengine/model/base_model/base_model.py", line 346, in _run_forward results = self(data, mode=mode) File "/usr/local/python3.7.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, *kwargs) File "/home/HwHiAiUser/Ascend_mm/mmdetection/mmdet/models/detectors/base.py", line 94, in forward return self.predict(inputs, data_samples) File "/home/HwHiAiUser/Ascend_mm/mmdetection/mmdet/models/detectors/two_stage.py", line 239, in predict x, rpn_results_list, batch_data_samples, rescale=rescale) File "/home/HwHiAiUser/Ascend_mm/mmdetection/mmdet/models/roi_heads/base_roi_head.py", line 123, in predict rescale=bbox_rescale) File "/home/HwHiAiUser/Ascend_mm/mmdetection/mmdet/models/roi_heads/standard_roi_head.py", line 335, in predict_bbox bbox_results = self._bbox_forward(x, rois) File "/home/HwHiAiUser/Ascend_mm/mmdetection/mmdet/models/roi_heads/standard_roi_head.py", line 164, in _bbox_forward x[:self.bbox_roi_extractor.num_inputs], rois) File "/usr/local/python3.7.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, kwargs) File "/home/HwHiAiUser/Ascend_mm/mmrotate/mmrotate/models/roi_heads/roi_extractors/rotate_single_level_roi_extractor.py", line 128, in forward roi_feats_t = self.roilayers[i](feats[i], rois) File "/usr/local/python3.7.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/HwHiAiUser/mmcv-2.0.0/mmcv/ops/roi_align_rotated.py", line 178, in forward self.clockwise) File "/home/HwHiAiUser/mmcv-2.0.0/mmcv/ops/roi_align_rotated.py", line 74, in forward clockwise=ctx.clockwise) RuntimeError: roi_align_rotated_forward_impl: implementation for device xla:0 not found.

input: cd mmrotate && python demo/image_demo.py --config=configs/oriented_reppoints/oriented-reppoints-qbox_r50_fpn_mstrain-40e_dota.py --checkpoint=oriented_reppoints_r50_fpn_40e_dota_ms_le135-bb0323fd.pth error: Traceback (most recent call last): File "demo/image_demo.py", line 72, in main(args) File "demo/image_demo.py", line 54, in main result = inference_detector(model, args.img) File "/home/HwHiAiUser/Ascend_mm/mmdetection/mmdet/apis/inference.py", line 177, in inference_detector results = model.teststep(data)[0] File "/usr/local/python3.7.5/lib/python3.7/site-packages/mmengine/model/base_model/base_model.py", line 145, in test_step return self._run_forward(data, mode='predict') # type: ignore File "/usr/local/python3.7.5/lib/python3.7/site-packages/mmengine/model/base_model/base_model.py", line 346, in _run_forward results = self(*data, mode=mode) File "/usr/local/python3.7.5/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, *kwargs) File "/home/HwHiAiUser/Ascend_mm/mmdetection/mmdet/models/detectors/base.py", line 94, in forward return self.predict(inputs, data_samples) File "/home/HwHiAiUser/Ascend_mm/mmdetection/mmdet/models/detectors/single_stage.py", line 111, in predict x, batch_data_samples, rescale=rescale) File "/home/HwHiAiUser/Ascend_mm/mmdetection/mmdet/models/dense_heads/base_dense_head.py", line 198, in predict outs, batch_img_metas=batch_img_metas, rescale=rescale) File "/home/HwHiAiUser/Ascend_mm/mmdetection/mmdet/models/dense_heads/base_dense_head.py", line 287, in predict_by_feat with_nms=with_nms) File "/home/HwHiAiUser/Ascend_mm/mmrotate/mmrotate/models/dense_heads/rotated_reppoints_head.py", line 460, in _predict_by_feat_single qboxes = min_area_polygons(pts) File "/home/HwHiAiUser/mmcv-2.0.0/mmcv/ops/min_area_polygons.py", line 19, in min_area_polygons ext_module.min_area_polygons(pointsets, polygons) RuntimeError: min_area_polygons_impl: implementation for device xla:0 not found.

Reproduces the problem - error message

above

Additional information

No response

GenerallyCovetous commented 6 months ago

Hi, I also met this problem when i run sdmgr model. "RuntimeError: roi_align_rotated_forward_impl: implementation for device xla:0 not found."
Have you solve this bug yet?

Balabala-Hong commented 6 months ago

Maybe you could check the list of supported ops in mmcv. (https://github.com/open-mmlab/mmcv/blob/main/docs/zh_cn/understand_mmcv/ops.md)

GenerallyCovetous commented 6 months ago

Thanks, I notice that ROIAlign is supported on Ascend, but according to my bug message, it is complete opposite. Does this have something to do with the version of mmcv(mine: 2.0.1) mmengine(0.10.3) mmocr(1.0.1) and torch(1.11.0) & torch_npu?

Balabala-Hong commented 6 months ago

The ops are integrated in mmcv. You should first check the version of mmcv and then find the corresponding mmocr version. Besides, the mmcv ops dev environment (CANN) is also very improtant.

GenerallyCovetous commented 6 months ago

The ops are integrated in mmcv. You should first check the version of mmcv and then find the corresponding mmocr version. Besides, the mmcv ops dev environment (CANN) is also very improtant.

Thanks, may I ask for your CANN and mmcv version? I want to align the versions and try again. Thank you very much

Balabala-Hong commented 6 months ago

https://github.com/open-mmlab/mmcv/issues/3002

GenerallyCovetous commented 6 months ago

3002

thanks a lot. But my MMCV and MMOCR were successfully installed as normal, and there was no problem verifying them according to the official website process. However, I encountered this error while using the built-in SDMGR model (I've verified dpnet, dpnetpp, master and they can all train normally)