open-mmlab / mmdetection

OpenMMLab Detection Toolbox and Benchmark
https://mmdetection.readthedocs.io
Apache License 2.0
29.57k stars 9.46k forks source link

MMDET is not fully Multi-Backend supported #8998

Open Qiza-lyhm opened 2 years ago

Qiza-lyhm commented 2 years ago

What is the problem this feature will solve?

MMDET中部分逻辑限制了运行设备仅能为CUDA或CPU,导致在其他设备上运行时存在问题(例如MLU)。

What is the feature you are proposing to solve the problem?

需要修改MMDET中部分模块的运行逻辑,使其支持根据运行状态切换设备后端,且新增的模块和功能也需要包含设备后端的判断及支持。

What alternatives have you considered?

在以下代码中,infinite_sampler使用了无device配置的sync_random_seed。在默认情况下,此接口会将Tensor配置为CUDA Tensor,并导致设备类型不是CUDA时(例如MLU)运行失败。

https://github.com/open-mmlab/mmdetection/blob/9d3e162459590eee4cfc891218dfbb5878378842/mmdet/datasets/samplers/infinite_sampler.py#L59

https://github.com/open-mmlab/mmdetection/blob/9d3e162459590eee4cfc891218dfbb5878378842/mmdet/core/utils/dist_utils.py#L157

在类似的其他接口中,某些接口(如下面给出的distributed_sampler)提供了自动设备选择的功能,可以识别出当前的运行设备。

https://github.com/open-mmlab/mmdetection/blob/9d3e162459590eee4cfc891218dfbb5878378842/mmdet/datasets/samplers/distributed_sampler.py#L29

GuWei007 commented 2 years ago

other model repos of open-mmlab have same issue

Qiza-lyhm commented 2 years ago

https://github.com/open-mmlab/mmdetection/pull/9004 这个PR是对此ISSUE中提到的部分代码的修改,但类似的问题在其他位置可能仍然存在

hhaAndroid commented 2 years ago

@Qiza-lyhm There is indeed a problem. Can you create a PR fix? Thank you.