[Bug] 华为910B上RoiAlignRotated算子不能进行梯度操作

Prerequisite

[X] I have searched Issues and Discussions but cannot get the expected help.
[X] The bug has not been fixed in the latest version(https://github.com/open-mmlab/mmcv).

Environment

使用的是华为Ascendhub的官方Docker镜像：ascend-pytorch:23.0.RC3-1.11.0 (https://ascendhub.huawei.com/#/detail/ascend-pytorch)， torch1.11.0 , CANN为7.0.RC1，系统架构为aarch64,显卡信息为910B

具体信息为： OrderedDict([('sys.platform', 'linux'), ('Python', '3.7.5 (default, Oct 28 2023, 09:08:27) [GCC 7.5.0]'), ('CUDA available', False), ('numpy_random_seed', 2147483648), ('GCC', 'gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0'), ('PyTorch', '1.11.0a0+gitbc2c6ed'), ('PyTorch compiling details', 'PyTorch built with:\n - GCC 7.5\n - C++ Version: 201402\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - NNPACK is enabled\n - CPU capability usage: NO AVX\n - Build settings: BUILD_TYPE=Release, CXX_COMPILER=/opt/buildtools/gcc-7.5.0/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DMISSING_ARM_VST1 -DMISSING_ARM_VLD1 -Wno-stringop-overflow, TORCH_VERSION=1.11.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n'), ('TorchVision', '0.12.0'), ('OpenCV', '4.8.1'), ('MMEngine', '0.10.2'), ('MMCV', '2.1.0'), ('MMCV Compiler', 'GCC 7.5'), ('MMCV CUDA Compiler', 'not available')])

Reproduces the problem - code sample

import argparse
import logging
import os
import os.path as osp

from mmdet.utils import register_all_modules as register_all_modules_mmdet
from mmengine.config import Config, DictAction
from mmengine.logging import print_log
from mmengine.registry import RUNNERS
from mmengine.runner import Runner
# from mmengine.evaluator import BaseMetric
# from mmengine.model import BaseModel
from mmengine.runner._flexible_runner import FlexibleRunner

from mmrotate.utils import register_all_modules
from mmengine.device import (is_cuda_available, is_mlu_available,
                             is_npu_available)

import torch
if torch.__version__ >= '1.8':
    import torch_npu
    from torch_npu.npu import amp
    import transfer_to_npu

def parse_args():
    parser = argparse.ArgumentParser(description='Train a detector')
    parser.add_argument('--config',
                        # default='configs/ascend/ringsee_dota.py', 
                        # default='configs/ascend/rotated-fcos-hbox-le90_swin-mix_fpn_1x_dota.py',
                        # default='configs/ascend/rotated-fcos-hbox-le90_r50_fpn_1x_dota.py',
                        default='configs/ascend/rotated_rtmdet/rotated_rtmdet_hbox_l-3x-dota.py',
                        help='train config file path')
    parser.add_argument('--work-dir', 
                        # default='work_dirs/rotated-fcos-hbox-le90_swin-mix_fpn-amp_1x_dota',
                        default="work_dirs/tmp",
                        help='the dir to save logs and models')
    parser.add_argument(
        '--amp',
        action='store_true',
        default=True,
        help='enable automatic-mixed-precision training')
    parser.add_argument(
        '--auto-scale-lr',
        action='store_true',
        help='enable automatically scaling LR.')
    parser.add_argument(
        '--resume',
        action='store_true',
        help='resume from the latest checkpoint in the work_dir automatically')
    parser.add_argument(
        '--cfg-options',
        nargs='+',
        action=DictAction,
        help='override some settings in the used config, the key-value pair '
        'in xxx=yyy format will be merged into config file. If the value to '
        'be overwritten is a list, it should be like key="[a,b]" or key=a,b '
        'It also allows nested list/tuple values, e.g. key="[(a,b),(c,d)]" '
        'Note that the quotation marks are necessary and that no white space '
        'is allowed.')
    parser.add_argument(
        '--launcher',
        choices=['none', 'pytorch', 'slurm', 'mpi'],
        default='none',
        help='job launcher')
    parser.add_argument('--local_rank', type=int, default=0)
    args = parser.parse_args()
    if 'LOCAL_RANK' not in os.environ:
        os.environ['LOCAL_RANK'] = str(args.local_rank)

    return args

def main():
    args = parse_args()

    assert is_cuda_available() or is_npu_available() or is_mlu_available(
        ), ('``GPU, NPU and MLU`` is unavailable during training ')

    # register all modules in mmdet into the registries
    # do not init the default scope here because it will be init in the runner
    register_all_modules_mmdet(init_default_scope=False)
    register_all_modules(init_default_scope=False)

    # load config
    cfg = Config.fromfile(args.config)
    cfg.launcher = args.launcher
    if args.cfg_options is not None:
        cfg.merge_from_dict(args.cfg_options)

    # work_dir is determined in this priority: CLI > segment in file > filename
    if args.work_dir is not None:
        # update configs according to CLI args if args.work_dir is not None
        cfg.work_dir = args.work_dir
    elif cfg.get('work_dir', None) is None:
        # use config filename as default work_dir if cfg.work_dir is None
        cfg.work_dir = osp.join('./work_dirs',
                                osp.splitext(osp.basename(args.config))[0])

    # enable automatic-mixed-precision training
    if args.amp is True:
        optim_wrapper = cfg.optim_wrapper.type
        if optim_wrapper == 'AmpOptimWrapper':
            print_log(
                'AMP training is already enabled in your config.',
                logger='current',
                level=logging.WARNING)
        else:
            assert optim_wrapper == 'OptimWrapper', (
                '`--amp` is only supported when the optimizer wrapper type is '
                f'`OptimWrapper` but got {optim_wrapper}.')
            cfg.optim_wrapper.type = 'AmpOptimWrapper'
            cfg.optim_wrapper.loss_scale = 'dynamic'

    # enable automatically scaling LR
    if args.auto_scale_lr:
        if 'auto_scale_lr' in cfg and \
                'enable' in cfg.auto_scale_lr and \
                'base_batch_size' in cfg.auto_scale_lr:
            cfg.auto_scale_lr.enable = True
        else:
            raise RuntimeError('Can not find "auto_scale_lr" or '
                               '"auto_scale_lr.enable" or '
                               '"auto_scale_lr.base_batch_size" in your'
                               ' configuration file.')

    cfg.resume = args.resume

    if cfg.launcher!=None:
        model_wrapper_cfg=dict(model_wrapper_cfg=
            dict(type='MMDistributedDataParallel', find_unused_parameters=True))
        cfg.merge_from_dict(model_wrapper_cfg)

    # build the runner from config
    if 'runner_type' not in cfg:
        # build the default runner
        runner = Runner.from_cfg(cfg)
    else:
        # build customized runner from the registry
        # if 'runner_type' is set in the cfg
        runner = RUNNERS.build(cfg)

    # start training
    runner.train()

if __name__ == '__main__':
    main()

Reproduces the problem - command or script

基于mmrotate的dev-1.x分支代码，在aarch64架构下的910B显卡上进行oriented-rcnn模型的训练（手动移除了mmrotate-dev1.x对mmcv的版本依赖关系判断代码） python /workspace/mmrotate/tools/train.py --config=configs/oriented-rcnn/oriented-rcnn-le90_r50_fpn_1x_dota.py --work-dir=./work_dir/oriented-rcnn-le90_r50_fpn_1x_dota --amp, 其中train.py里使用的华为的代码自动迁移工具，需要额外导入

if torch.__version__ >= '1.8':
    import torch_npu
    from torch_npu.npu import amp
    import transfer_to_npu

Reproduces the problem - error message

/usr/local/Ascend/ascend-toolkit/latest/tools/ms_fmk_transplt/torch_npu_bridge/transfer_to_npu.py:159: ImportWarning:
    *************************************************************************************************************
    The torch.Tensor.cuda and torch.nn.Module.cuda are replaced with torch.Tensor.npu and torch.nn.Module.npu now..
    The torch.cuda.DoubleTensor is replaced with torch.npu.FloatTensor cause the double type is not supported now..
    The backend in torch.distributed.init_process_group set to hccl now..
    The torch.cuda.* and torch.cuda.amp.* are replaced with torch.npu.* and torch.npu.amp.* now..
    The device parameters have been replaced with npu in the function below:
    torch.logspace, torch.randint, torch.hann_window, torch.rand, torch.full_like, torch.ones_like, torch.rand_like, torch.randperm, torch.arange, torch.frombuffer, torch.normal, torch._empty_per_channel_affine_quantized, torch.empty_strided, torch.empty_like, torch.scalar_tensor, torch.tril_indices, torch.bartlett_window, torch.ones, torch.sparse_coo_tensor, torch.randn, torch.kaiser_window, torch.tensor, torch.triu_indices, torch.as_tensor, torch.zeros, torch.randint_like, torch.full, torch.eye, torch._sparse_csr_tensor_unsafe, torch.empty, torch._sparse_coo_tensor_unsafe, torch.blackman_window, torch.zeros_like, torch.range, torch.sparse_csr_tensor, torch.randn_like, torch.from_file, torch._cudnn_init_dropout_state, torch._empty_affine_quantized, torch.linspace, torch.hamming_window, torch.empty_quantized, torch._pin_memory, torch.autocast, torch.Tensor.new_empty, torch.Tensor.new_empty_strided, torch.Tensor.new_full, torch.Tensor.new_ones, torch.Tensor.new_tensor, torch.Tensor.new_zeros, torch.Tensor.to, torch.nn.Module.to, torch.nn.Module.to_empty
    *************************************************************************************************************

  warnings.warn(msg, ImportWarning)
/usr/local/python3.7.5/lib/python3.7/site-packages/mmcv/cnn/bricks/transformer.py:26: ImportWarning: ``MultiScaleDeformableAttention`` has been moved to ``mmcv.ops.multi_scale_deform_attn``, please change original path ``from mmcv.cnn.bricks.transformer import MultiScaleDeformableAttention`` to ``from mmcv.ops.multi_scale_deform_attn import MultiScaleDeformableAttention``
  '``MultiScaleDeformableAttention`` has been moved to '
/usr/local/python3.7.5/lib/python3.7/site-packages/torchvision/io/image.py:13: UserWarning: Failed to load image Python extension:
  warn(f"Failed to load image Python extension: {e}")
/usr/local/python3.7.5/lib/python3.7/site-packages/timm/models/layers/__init__.py:49: DeprecationWarning: Importing from timm.models.layers is deprecated, please import via timm.layers
  warnings.warn(f"Importing from {__name__} is deprecated, please import via timm.layers", DeprecationWarning)
/workspace/mmrotate/mmrotate/models/detectors/refine_single_stage.py:2: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
  from collections import Sequence
01/08 21:26:50 - mmengine - INFO -
------------------------------------------------------------
System environment:
    sys.platform: linux
    Python: 3.7.5 (default, Oct 28 2023, 09:08:27) [GCC 7.5.0]
    CUDA available: True
    numpy_random_seed: 1153850143
    GPU 0,1: Ascend910B
    CUDA_HOME: None
    GCC: gcc (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
    PyTorch: 1.11.0a0+gitbc2c6ed
    PyTorch compiling details: PyTorch built with:
  - GCC 7.5
  - C++ Version: 201402
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - NNPACK is enabled
  - CPU capability usage: NO AVX
  - Build settings: BUILD_TYPE=Release, CXX_COMPILER=/opt/buildtools/gcc-7.5.0/bin/c++, CXX_FLAGS= -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-sign-compare -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -DMISSING_ARM_VST1 -DMISSING_ARM_VLD1 -Wno-stringop-overflow, TORCH_VERSION=1.11.0, USE_CUDA=OFF, USE_CUDNN=OFF, USE_EIGEN_FOR_BLAS=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

    TorchVision: 0.12.0
    OpenCV: 4.8.1
    MMEngine: 0.10.2

Runtime environment:
    cudnn_benchmark: False
    mp_cfg: {'mp_start_method': 'fork', 'opencv_num_threads': 0}
    dist_cfg: {'backend': 'nccl'}
    seed: 1153850143
    Distributed launcher: none
    Distributed training: False
    GPU number: 1
------------------------------------------------------------

01/08 21:26:51 - mmengine - INFO - Config:
angle_version = 'le90'
data_root = '/dataset/DOTA_v1.0/dota_ss_crop-1024_overlap-200'
dataset_type = 'DOTADataset'
default_hooks = dict(
    checkpoint=dict(
        by_epoch=True,
        interval=1,
        rule='greater',
        save_best='dota/mAP',
        type='CheckpointHook'),
    logger=dict(interval=50, type='LoggerHook'),
    param_scheduler=dict(type='ParamSchedulerHook'),
    sampler_seed=dict(type='DistSamplerSeedHook'),
    timer=dict(type='IterTimerHook'),
    visualization=dict(type='mmdet.DetVisualizationHook'))
default_scope = 'mmrotate'
env_cfg = dict(
    cudnn_benchmark=False,
    dist_cfg=dict(backend='nccl'),
    mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0))
file_client_args = dict(backend='disk')
launcher = 'none'
load_from = None
log_level = 'INFO'
log_processor = dict(by_epoch=True, type='LogProcessor', window_size=50)
model = dict(
    backbone=dict(
        depth=50,
        frozen_stages=1,
        init_cfg=dict(checkpoint='torchvision://resnet50', type='Pretrained'),
        norm_cfg=dict(requires_grad=True, type='BN'),
        norm_eval=True,
        num_stages=4,
        out_indices=(
            0,
            1,
            2,
            3,
        ),
        style='pytorch',
        type='mmdet.ResNet'),
    data_preprocessor=dict(
        bgr_to_rgb=True,
        boxtype2tensor=False,
        mean=[
            123.675,
            116.28,
            103.53,
        ],
        pad_size_divisor=32,
        std=[
            58.395,
            57.12,
            57.375,
        ],
        type='mmdet.DetDataPreprocessor'),
    neck=dict(
        in_channels=[
            256,
            512,
            1024,
            2048,
        ],
        num_outs=5,
        out_channels=256,
        type='mmdet.FPN'),
    roi_head=dict(
        bbox_head=dict(
            bbox_coder=dict(
                angle_version='le90',
                edge_swap=True,
                norm_factor=None,
                proj_xy=True,
                target_means=(
                    0.0,
                    0.0,
                    0.0,
                    0.0,
                    0.0,
                ),
                target_stds=(
                    0.1,
                    0.1,
                    0.2,
                    0.2,
                    0.1,
                ),
                type='DeltaXYWHTRBBoxCoder'),
            cls_predictor_cfg=dict(type='mmdet.Linear'),
            fc_out_channels=1024,
            in_channels=256,
            loss_bbox=dict(
                beta=1.0, loss_weight=1.0, type='mmdet.SmoothL1Loss'),
            loss_cls=dict(
                loss_weight=1.0,
                type='mmdet.CrossEntropyLoss',
                use_sigmoid=False),
            num_classes=15,
            predict_box_type='rbox',
            reg_class_agnostic=True,
            reg_predictor_cfg=dict(type='mmdet.Linear'),
            roi_feat_size=7,
            type='mmdet.Shared2FCBBoxHead'),
        bbox_roi_extractor=dict(
            featmap_strides=[
                4,
                8,
                16,
                32,
            ],
            out_channels=256,
            roi_layer=dict(
                clockwise=True,
                out_size=7,
                sample_num=2,
                type='RoIAlignRotated'),
            type='RotatedSingleRoIExtractor'),
        type='mmdet.StandardRoIHead'),
    rpn_head=dict(
        anchor_generator=dict(
            ratios=[
                0.5,
                1.0,
                2.0,
            ],
            scales=[
                8,
            ],
            strides=[
                4,
                8,
                16,
                32,
                64,
            ],
            type='mmdet.AnchorGenerator',
            use_box_type=True),
        bbox_coder=dict(
            angle_version='le90',
            target_means=[
                0.0,
                0.0,
                0.0,
                0.0,
                0.0,
                0.0,
            ],
            target_stds=[
                1.0,
                1.0,
                1.0,
                1.0,
                0.5,
                0.5,
            ],
            type='MidpointOffsetCoder'),
        feat_channels=256,
        in_channels=256,
        loss_bbox=dict(
            beta=0.1111111111111111,
            loss_weight=1.0,
            type='mmdet.SmoothL1Loss'),
        loss_cls=dict(
            loss_weight=1.0, type='mmdet.CrossEntropyLoss', use_sigmoid=True),
        type='OrientedRPNHead'),
    test_cfg=dict(
        rcnn=dict(
            max_per_img=2000,
            min_bbox_size=0,
            nms=dict(iou_threshold=0.1, type='nms_rotated'),
            nms_pre=2000,
            score_thr=0.05),
        rpn=dict(
            max_per_img=2000,
            min_bbox_size=0,
            nms=dict(iou_threshold=0.8, type='nms'),
            nms_pre=2000)),
    train_cfg=dict(
        rcnn=dict(
            assigner=dict(
                ignore_iof_thr=-1,
                iou_calculator=dict(type='RBboxOverlaps2D'),
                match_low_quality=False,
                min_pos_iou=0.5,
                neg_iou_thr=0.5,
                pos_iou_thr=0.5,
                type='mmdet.MaxIoUAssigner'),
            debug=False,
            pos_weight=-1,
            sampler=dict(
                add_gt_as_proposals=True,
                neg_pos_ub=-1,
                num=512,
                pos_fraction=0.25,
                type='mmdet.RandomSampler')),
        rpn=dict(
            allowed_border=0,
            assigner=dict(
                ignore_iof_thr=-1,
                iou_calculator=dict(type='RBbox2HBboxOverlaps2D'),
                match_low_quality=True,
                min_pos_iou=0.3,
                neg_iou_thr=0.3,
                pos_iou_thr=0.7,
                type='mmdet.MaxIoUAssigner'),
            debug=False,
            pos_weight=-1,
            sampler=dict(
                add_gt_as_proposals=False,
                neg_pos_ub=-1,
                num=256,
                pos_fraction=0.5,
                type='mmdet.RandomSampler')),
        rpn_proposal=dict(
            max_per_img=2000,
            min_bbox_size=0,
            nms=dict(iou_threshold=0.8, type='nms'),
            nms_pre=2000)),
    type='mmdet.FasterRCNN')
model_wrapper_cfg = dict(
    find_unused_parameters=True, type='MMDistributedDataParallel')
optim_wrapper = dict(
    clip_grad=dict(max_norm=35, norm_type=2),
    loss_scale='dynamic',
    optimizer=dict(lr=0.005, momentum=0.9, type='SGD', weight_decay=0.0001),
    type='AmpOptimWrapper')
param_scheduler = [
    dict(
        begin=0,
        by_epoch=False,
        end=500,
        start_factor=0.3333333333333333,
        type='LinearLR'),
    dict(
        begin=0,
        by_epoch=True,
        end=12,
        gamma=0.1,
        milestones=[
            8,
            11,
        ],
        type='MultiStepLR'),
]
resume = False
test_cfg = dict(type='TestLoop')
test_dataloader = dict(
    batch_size=1,
    dataset=dict(
        ann_file='val/annfiles/',
        data_prefix=dict(img_path='val/images/'),
        data_root='/dataset/DOTA_v1.0/dota_ss_crop-1024_overlap-200',
        img_shape=(
            1024,
            1024,
        ),
        pipeline=[
            dict(
                file_client_args=dict(backend='disk'),
                type='mmdet.LoadImageFromFile'),
            dict(keep_ratio=True, scale=(
                1024,
                1024,
            ), type='mmdet.Resize'),
            dict(
                box_type='qbox', type='mmdet.LoadAnnotations', with_bbox=True),
            dict(
                box_type_mapping=dict(gt_bboxes='rbox'),
                type='ConvertBoxType'),
            dict(
                meta_keys=(
                    'img_id',
                    'img_path',
                    'ori_shape',
                    'img_shape',
                    'scale_factor',
                ),
                type='mmdet.PackDetInputs'),
        ],
        test_mode=True,
        type='DOTADataset'),
    drop_last=False,
    num_workers=2,
    persistent_workers=True,
    sampler=dict(shuffle=False, type='DefaultSampler'))
test_evaluator = dict(metric='mAP', type='DOTAMetric')
test_pipeline = [
    dict(
        file_client_args=dict(backend='disk'), type='mmdet.LoadImageFromFile'),
    dict(keep_ratio=True, scale=(
        1024,
        1024,
    ), type='mmdet.Resize'),
    dict(
        meta_keys=(
            'img_id',
            'img_path',
            'ori_shape',
            'img_shape',
            'scale_factor',
        ),
        type='mmdet.PackDetInputs'),
]
train_cfg = dict(max_epochs=25, type='EpochBasedTrainLoop', val_interval=1)
train_dataloader = dict(
    batch_sampler=None,
    batch_size=2,
    dataset=dict(
        ann_file='train/annfiles/',
        data_prefix=dict(img_path='train/images/'),
        data_root='/dataset/DOTA_v1.0/dota_ss_crop-1024_overlap-200',
        filter_cfg=dict(filter_empty_gt=True),
        img_shape=(
            1024,
            1024,
        ),
        pipeline=[
            dict(
                file_client_args=dict(backend='disk'),
                type='mmdet.LoadImageFromFile'),
            dict(
                box_type='qbox', type='mmdet.LoadAnnotations', with_bbox=True),
            dict(
                box_type_mapping=dict(gt_bboxes='rbox'),
                type='ConvertBoxType'),
            dict(keep_ratio=True, scale=(
                1024,
                1024,
            ), type='mmdet.Resize'),
            dict(
                direction=[
                    'horizontal',
                    'vertical',
                    'diagonal',
                ],
                prob=0.75,
                type='mmdet.RandomFlip'),
            dict(type='mmdet.PackDetInputs'),
        ],
        type='DOTADataset'),
    num_workers=1,
    persistent_workers=True,
    sampler=dict(shuffle=True, type='DefaultSampler'))
train_pipeline = [
    dict(
        file_client_args=dict(backend='disk'), type='mmdet.LoadImageFromFile'),
    dict(box_type='qbox', type='mmdet.LoadAnnotations', with_bbox=True),
    dict(box_type_mapping=dict(gt_bboxes='rbox'), type='ConvertBoxType'),
    dict(keep_ratio=True, scale=(
        1024,
        1024,
    ), type='mmdet.Resize'),
    dict(
        direction=[
            'horizontal',
            'vertical',
            'diagonal',
        ],
        prob=0.75,
        type='mmdet.RandomFlip'),
    dict(type='mmdet.PackDetInputs'),
]
val_cfg = dict(type='ValLoop')
val_dataloader = dict(
    batch_size=1,
    dataset=dict(
        ann_file='val/annfiles/',
        data_prefix=dict(img_path='val/images/'),
        data_root='/dataset/DOTA_v1.0/dota_ss_crop-1024_overlap-200',
        img_shape=(
            1024,
            1024,
        ),
        pipeline=[
            dict(
                file_client_args=dict(backend='disk'),
                type='mmdet.LoadImageFromFile'),
            dict(keep_ratio=True, scale=(
                1024,
                1024,
            ), type='mmdet.Resize'),
            dict(
                box_type='qbox', type='mmdet.LoadAnnotations', with_bbox=True),
            dict(
                box_type_mapping=dict(gt_bboxes='rbox'),
                type='ConvertBoxType'),
            dict(
                meta_keys=(
                    'img_id',
                    'img_path',
                    'ori_shape',
                    'img_shape',
                    'scale_factor',
                ),
                type='mmdet.PackDetInputs'),
        ],
        test_mode=True,
        type='DOTADataset'),
    drop_last=False,
    num_workers=2,
    persistent_workers=True,
    sampler=dict(shuffle=False, type='DefaultSampler'))
val_evaluator = dict(metric='mAP', type='DOTAMetric')
val_pipeline = [
    dict(
        file_client_args=dict(backend='disk'), type='mmdet.LoadImageFromFile'),
    dict(keep_ratio=True, scale=(
        1024,
        1024,
    ), type='mmdet.Resize'),
    dict(box_type='qbox', type='mmdet.LoadAnnotations', with_bbox=True),
    dict(box_type_mapping=dict(gt_bboxes='rbox'), type='ConvertBoxType'),
    dict(
        meta_keys=(
            'img_id',
            'img_path',
            'ori_shape',
            'img_shape',
            'scale_factor',
        ),
        type='mmdet.PackDetInputs'),
]
vis_backends = [
    dict(type='LocalVisBackend'),
]
visualizer = dict(
    name='visualizer',
    type='RotLocalVisualizer',
    vis_backends=[
        dict(type='LocalVisBackend'),
    ])
work_dir = './work_dir/oriented-rcnn-le90_r50_fpn_1x_dota'

/workspace/mmengine/mmengine/utils/misc.py:391: DeprecationWarning: "out_size" is deprecated in `RoIAlignRotated.__init__`, please use "output_size" instead
  'instead', DeprecationWarning)
/workspace/mmengine/mmengine/utils/misc.py:391: DeprecationWarning: "sample_num" is deprecated in `RoIAlignRotated.__init__`, please use "sampling_ratio" instead
  'instead', DeprecationWarning)
01/08 21:26:54 - mmengine - INFO - Distributed training is not used, all SyncBatchNorm (SyncBN) layers in the model will be automatically reverted to BatchNormXd layers if they are used.
01/08 21:26:54 - mmengine - INFO - Hooks will be executed in the following order:
before_run:
(VERY_HIGH   ) RuntimeInfoHook
(BELOW_NORMAL) LoggerHook
 --------------------
before_train:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(VERY_LOW    ) CheckpointHook
 --------------------
before_train_epoch:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(NORMAL      ) DistSamplerSeedHook
 --------------------
before_train_iter:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
 --------------------
after_train_iter:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW         ) ParamSchedulerHook
(VERY_LOW    ) CheckpointHook
 --------------------
after_train_epoch:
(NORMAL      ) IterTimerHook
(LOW         ) ParamSchedulerHook
(VERY_LOW    ) CheckpointHook
 --------------------
before_val:
(VERY_HIGH   ) RuntimeInfoHook
 --------------------
before_val_epoch:
(NORMAL      ) IterTimerHook
 --------------------
before_val_iter:
(NORMAL      ) IterTimerHook
 --------------------
after_val_iter:
(NORMAL      ) IterTimerHook
(NORMAL      ) DetVisualizationHook
(BELOW_NORMAL) LoggerHook
 --------------------
after_val_epoch:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook
(LOW         ) ParamSchedulerHook
(VERY_LOW    ) CheckpointHook
 --------------------
after_val:
(VERY_HIGH   ) RuntimeInfoHook
 --------------------
after_train:
(VERY_HIGH   ) RuntimeInfoHook
(VERY_LOW    ) CheckpointHook
 --------------------
before_test:
(VERY_HIGH   ) RuntimeInfoHook
 --------------------
before_test_epoch:
(NORMAL      ) IterTimerHook
 --------------------
before_test_iter:
(NORMAL      ) IterTimerHook
 --------------------
after_test_iter:
(NORMAL      ) IterTimerHook
(NORMAL      ) DetVisualizationHook
(BELOW_NORMAL) LoggerHook
 --------------------
after_test_epoch:
(VERY_HIGH   ) RuntimeInfoHook
(NORMAL      ) IterTimerHook
(BELOW_NORMAL) LoggerHook
 --------------------
after_test:
(VERY_HIGH   ) RuntimeInfoHook
 --------------------
after_run:
(BELOW_NORMAL) LoggerHook
 --------------------
/usr/local/python3.7.5/lib/python3.7/site-packages/mmcv/transforms/loading.py:71: DeprecationWarning: "file_client_args" will be deprecated in future. Please use "backend_args" instead
  'Please use "backend_args" instead', DeprecationWarning)
01/08 21:26:59 - mmengine - WARNING - Failed to search registry with scope "mmrotate" in the "optim_wrapper" registry tree. As a workaround, the current "optim_wrapper" registry in "mmengine" is used to build instance. This may cause unexpected failure when running the built modules. Please check whether "mmrotate" is a correct scope, or whether the registry is initialized.
01/08 21:27:01 - mmengine - INFO - load model from: torchvision://resnet50
01/08 21:27:01 - mmengine - INFO - Loads checkpoint by torchvision backend from path: torchvision://resnet50
01/08 21:27:01 - mmengine - WARNING - The model and loaded state dict do not match exactly

unexpected key in source state_dict: fc.weight, fc.bias

01/08 21:27:01 - mmengine - WARNING - "FileClient" will be deprecated in future. Please use io functions in https://mmengine.readthedocs.io/en/latest/api/fileio.html#file-io
01/08 21:27:01 - mmengine - WARNING - "HardDiskBackend" is the alias of "LocalBackend" and the former will be deprecated in future.
01/08 21:27:01 - mmengine - INFO - Checkpoints will be saved to /workspace/mmrotate/work_dir/oriented-rcnn-le90_r50_fpn_1x_dota.
/workspace/mmrotate/mmrotate/structures/bbox/rotated_boxes.py:192: UserWarning: The `clip` function does nothing in `RotatedBoxes`.
  warnings.warn('The `clip` function does nothing in `RotatedBoxes`.')
/workspace/mmengine/mmengine/model/base_model/data_preprocessor.py:257: UserWarning: AutoNonVariableTypeMode is deprecated and will be removed in 1.10 release. For kernel implementations please use AutoDispatchBelowADInplaceOrView instead, If you are looking for a user facing API to enable running your inference-only workload, please use c10::InferenceMode. Using AutoDispatchBelowADInplaceOrView in user code is under risk of producing silent wrong result in some edge cases. See Note [AutoDispatchBelowAutograd] for more details. (Triggered internally at  /opt/_internal/cpython-3.7.17/lib/python3.7/site-packages/torch/include/ATen/core/LegacyTypeDispatch.h:79.)
  _batch_input = _batch_input.float()
[W AmpForeachNonFiniteCheckAndUnscaleKernelNpuOpApi.cpp:102] Warning: Non finite check and unscale on NPU device! (function operator())
[W AmpForeachNonFiniteCheckAndUnscaleKernelNpu.cpp:28] Warning: Non finite check on NPU device! (function operator())
EZ3003: No supported Ops kernel and engine are found for [RoiAlignRotatedGrad], optype [RoiAlignRotatedGrad].
        Possible Cause: The operator is not supported by the system. Therefore, no hit is found in any operator information library.
        Solution: 1. Check that the OPP component is installed properly. 2. Submit an issue to request for the support of this operator type.
        TraceBack (most recent call last):
        build graph failed, graph id:56, ret:-1[FUNC:BuildModelWithGraphId][FILE:ge_generator.cc][LINE:1615]
        [Build][SingleOpModel]call ge interface generator.BuildSingleOpModel failed. ge result = 4294967295[FUNC:ReportCallError][FILE:log_inner.cpp][LINE:161]
        [Build][Op]Fail to build op model[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
        build op model failed, result = 500002[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]

terminate called after throwing an instance of 'std::runtime_error'
  what():  ASCEND kernel errors might be asynchronously reported at some other API call, so the stacktrace below is not the root cause of the problem.
For getting the stacktrace of OP in PyTorch, consider passing ASCEND_LAUNCH_BLOCKING=1.
Aborted (core dumped)

Additional information

注意到mmcv-main分支下最近新增加了NPU设备上的RoIAlignRotated算子适配，但是实际测试mmrotate的旋转目标检测算法oriented-RCNN时发现仍存在问题，不支持该算子的梯度求导操作。此外，在mmrotate的20多种旋转目标检测算法中，有一半算法都用的是IoUloss系列（RotatedIoULoss && IoULoss），涉及到mmcv里的MinAreaPolygon ConvexIoU DiffIoURotated这几个算子均没有进行NPU设备上的适配，导致mmrotate的算法在NPU上只有1-2个能跑通的模型。希望团队能将NPU设备上的这些常用算子适配提上日程

open-mmlab / mmcv