open-mmlab / mmcv

OpenMMLab Computer Vision Foundation
https://mmcv.readthedocs.io/en/latest/
Apache License 2.0
5.91k stars 1.65k forks source link

torch_npu support aclnn and add op #2998

Closed momo609 closed 10 months ago

momo609 commented 12 months ago

Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.

Motivation

Please describe the motivation of this PR and the goal you want to achieve through this PR.

Modification

Please briefly describe what modification is made in this PR.

BC-breaking (Optional)

Does the modification introduce changes that break the backward-compatibility of the downstream repositories? If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.

Use cases (Optional)

If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.

Checklist

Before PR:

After PR:

CLAassistant commented 11 months ago

CLA assistant check
All committers have signed the CLA.

chekistcccp commented 11 months ago

测试时报错,信息如下,环境与issue#3002一致 In file included from /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_helper.hpp:26:0, from /home/ma-user/work/mmcv/mmcv/ops/csrc/pytorch/npu/chamfer_distance_npu.cpp:1: /home/ma-user/work/mmcv/mmcv/ops/csrc/pytorch/npu/chamfer_distance_npu.cpp: In function ‘void chamfer_distance_backward_npu(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor)’: /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40: error: ‘utils’ is not a member of ‘torch_npu’ at::TensorOptions(torch_npu::utils::get_npu_device_type()); \ ^ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40: note: in definition of macro ‘EXEC_NPU_CMD’ at::TensorOptions(torch_npu::utils::get_npu_device_type()); \ ^~~~~ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40: note: suggested alternatives: at::TensorOptions(torch_npu::utils::get_npu_device_type()); \ ^ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40: note: in definition of macro ‘EXEC_NPU_CMD’ at::TensorOptions(torch_npu::utils::get_npu_device_type()); \

chekistcccp commented 11 months ago

在相同环境下测试仍然出现同样报错

chekistcccp commented 11 months ago

mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:22处建议改为

include </usr/local/Ascend/ascend-toolkit/latest/runtime/include/acl/acl_base.h>

mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:23处建议改为

include </usr/local/Ascend/ascend-toolkit/latest/runtime/include/acl/acl_rt.h>

编译时经常出现提示找不到文件,改为绝对路径后问题不再出现

momo609 commented 10 months ago

测试时报错,信息如下,环境与issue#3002一致 In 文件包含在/home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_helper.hpp:26:0, 来自/home/ma- user/work/mmcv/mmcv/ops/csrc/pytorch/npu/chamfer_distance_npu.cpp:1: /home/ma-user/work/mmcv/mmcv/ops/csrc/pytorch/npu/chamfer_distance_npu.cpp:在函数' void chamfer_distance_backward_npu(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor)': /home/ ma- user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:错误:“utils”不是“torch_npu”的成员 :::TensorOptions(torch_npu::utils::get_npu_device_type()) ; ^ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:在宏“EXEC_NPU_CMD”的定义中 :::TensorOptions(torch_npu::utils::get_npu_device_type( )); ^~~~~ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:建议的替代方案: at::TensorOptions(torch_npu::utils::get_npu_device_type( )); ^ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:在宏“EXEC_NPU_CMD”的定义中 :::TensorOptions(torch_npu::utils::get_npu_device_type( ));\

在相同环境下测试仍然出现同样报错

你好,请问这边的环境是什么版本和什么日期的torch_npu?

chekistcccp commented 10 months ago

测试时报错,信息如下,环境与issue#3002一致 In 文件包含在/home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_helper.hpp:26:0, 来自/home/ma- user/work/mmcv/mmcv/ops/csrc/pytorch/npu/chamfer_distance_npu.cpp:1: /home/ma-user/work/mmcv/mmcv/ops/csrc/pytorch/npu/chamfer_distance_npu.cpp:在函数' void chamfer_distance_backward_npu(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor)': /home/ ma- user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:错误:“utils”不是“torch_npu”的成员 :::TensorOptions(torch_npu::utils::get_npu_device_type()) ; ^ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:在宏“EXEC_NPU_CMD”的定义中 :::TensorOptions(torch_npu::utils::get_npu_device_type( )); ^~~~~ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:建议的替代方案: at::TensorOptions(torch_npu::utils::get_npu_device_type( )); ^ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:在宏“EXEC_NPU_CMD”的定义中 :::TensorOptions(torch_npu::utils::get_npu_device_type( ));\

在相同环境下测试仍然出现同样报错

你好,请问这边的环境是什么版本和什么日期的torch_npu?

pytorch版本为1.11.0,CANN版本为6.3.2,python环境为py_3.7,OS euler_2.8.3-aarch64,torch-npu版本为1.11.0.post1.dev20230719

momo609 commented 10 months ago

测试时报错,信息如下,环境与issue#3002一致 In 文件包含在/home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_helper.hpp:26:0, 来自/home/ma- user/work/mmcv/mmcv/ops/csrc/pytorch/npu/chamfer_distance_npu.cpp:1: /home/ma-user/work/mmcv/mmcv/ops/csrc/pytorch/npu/chamfer_distance_npu.cpp:在函数' void chamfer_distance_backward_npu(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor)': /home/ ma- user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:错误:“utils”不是“torch_npu”的成员 :::TensorOptions(torch_npu::utils::get_npu_device_type()) ; ^ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:在宏“EXEC_NPU_CMD”的定义中 :::TensorOptions(torch_npu::utils::get_npu_device_type( )); ^~~~~ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:建议的替代方案: at::TensorOptions(torch_npu::utils::get_npu_device_type( )); ^ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:在宏“EXEC_NPU_CMD”的定义中 :::TensorOptions(torch_npu::utils::get_npu_device_type( ));\

在相同环境下测试仍然出现同样报错

你好,请问这边的环境是什么版本和什么日期的torch_npu?

pytorch版本为1.11.0,CANN版本为6.3.2,python环境为py_3.7,OS euler_2.8.3-aarch64,torch-npu版本为1.11.0.post1.dev20230719

可以升级CANN和torch版本来解决问题

chekistcccp commented 10 months ago

测试时报错,信息如下,环境与issue#3002一致 In 文件包含在/home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_helper.hpp:26:0, 来自/home/ma- user/work/mmcv/mmcv/ops/csrc/pytorch/npu/chamfer_distance_npu.cpp:1: /home/ma-user/work/mmcv/mmcv/ops/csrc/pytorch/npu/chamfer_distance_npu.cpp:在函数' void chamfer_distance_backward_npu(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor)': /home/ ma- user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:错误:“utils”不是“torch_npu”的成员 :::TensorOptions(torch_npu::utils::get_npu_device_type()) ; ^ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:在宏“EXEC_NPU_CMD”的定义中 :::TensorOptions(torch_npu::utils::get_npu_device_type( )); ^~~~~ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:建议的替代方案: at::TensorOptions(torch_npu::utils::get_npu_device_type( )); ^ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:在宏“EXEC_NPU_CMD”的定义中 :::TensorOptions(torch_npu::utils::get_npu_device_type( ));\

在相同环境下测试仍然出现同样报错

你好,请问这边的环境是什么版本和什么日期的torch_npu?

pytorch版本为1.11.0,CANN版本为6.3.2,python环境为py_3.7,OS euler_2.8.3-aarch64,torch-npu版本为1.11.0.post1.dev20230719

可以升级CANN和torch版本来解决问题

您这边测试使用的什么环境?我这边尝试建立一下

chekistcccp commented 10 months ago

测试时报错,信息如下,环境与issue#3002一致 In 文件包含在/home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_helper.hpp:26:0, 来自/home/ma- user/work/mmcv/mmcv/ops/csrc/pytorch/npu/chamfer_distance_npu.cpp:1: /home/ma-user/work/mmcv/mmcv/ops/csrc/pytorch/npu/chamfer_distance_npu.cpp:在函数' void chamfer_distance_backward_npu(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor)': /home/ ma- user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:错误:“utils”不是“torch_npu”的成员 :::TensorOptions(torch_npu::utils::get_npu_device_type()) ; ^ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:在宏“EXEC_NPU_CMD”的定义中 :::TensorOptions(torch_npu::utils::get_npu_device_type( )); ^~~~~ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:建议的替代方案: at::TensorOptions(torch_npu::utils::get_npu_device_type( )); ^ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:在宏“EXEC_NPU_CMD”的定义中 :::TensorOptions(torch_npu::utils::get_npu_device_type( ));\

在相同环境下测试仍然出现同样报错

你好,请问这边的环境是什么版本和什么日期的torch_npu?

pytorch版本为1.11.0,CANN版本为6.3.2,python环境为py_3.7,OS euler_2.8.3-aarch64,torch-npu版本为1.11.0.post1.dev20230719

可以升级CANN和torch版本来解决问题

我这边使用了pytorch:2.0.1-CANN6.3.RC2-py39,torch-npu版本为2.0.1rc1,仍然报相同错误

momo609 commented 10 months ago

测试时报错,信息如下,环境与issue#3002一致 In 文件包含在/home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_helper.hpp:26:0, 来自/home/ma- user/work/mmcv/mmcv/ops/csrc/pytorch/npu/chamfer_distance_npu.cpp:1: /home/ma-user/work/mmcv/mmcv/ops/csrc/pytorch/npu/chamfer_distance_npu.cpp:在函数' void chamfer_distance_backward_npu(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor)': /home/ ma- user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:错误:“utils”不是“torch_npu”的成员 :::TensorOptions(torch_npu::utils::get_npu_device_type()) ; ^ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:在宏“EXEC_NPU_CMD”的定义中 :::TensorOptions(torch_npu::utils::get_npu_device_type( )); ^~~~~ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:建议的替代方案: at::TensorOptions(torch_npu::utils::get_npu_device_type( )); ^ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:在宏“EXEC_NPU_CMD”的定义中 :::TensorOptions(torch_npu::utils::get_npu_device_type( ));\

在相同环境下测试仍然出现同样报错

你好,请问这边的环境是什么版本和什么日期的torch_npu?

pytorch版本为1.11.0,CANN版本为6.3.2,python环境为py_3.7,OS euler_2.8.3-aarch64,torch-npu版本为1.11.0.post1.dev20230719

可以升级CANN和torch版本来解决问题

我这边使用了pytorch:2.0.1-CANN6.3.RC2-py39,torch-npu版本为2.0.1rc1,仍然报相同错误

使用CANN7.1.0rc4版本,torch-npu版本为最新的1.11.0,建议使用最近的torch_npu和cann包

chekistcccp commented 10 months ago

测试时报错,信息如下,环境与issue#3002一致 In 文件包含在/home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_helper.hpp:26:0, 来自/home/ma- user/work/mmcv/mmcv/ops/csrc/pytorch/npu/chamfer_distance_npu.cpp:1: /home/ma-user/work/mmcv/mmcv/ops/csrc/pytorch/npu/chamfer_distance_npu.cpp:在函数' void chamfer_distance_backward_npu(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor)': /home/ ma- user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:错误:“utils”不是“torch_npu”的成员 :::TensorOptions(torch_npu::utils::get_npu_device_type()) ; ^ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:在宏“EXEC_NPU_CMD”的定义中 :::TensorOptions(torch_npu::utils::get_npu_device_type( )); ^~~~~ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:建议的替代方案: at::TensorOptions(torch_npu::utils::get_npu_device_type( )); ^ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:在宏“EXEC_NPU_CMD”的定义中 :::TensorOptions(torch_npu::utils::get_npu_device_type( ));\

在相同环境下测试仍然出现同样报错

你好,请问这边的环境是什么版本和什么日期的torch_npu?

pytorch版本为1.11.0,CANN版本为6.3.2,python环境为py_3.7,OS euler_2.8.3-aarch64,torch-npu版本为1.11.0.post1.dev20230719

可以升级CANN和torch版本来解决问题

我这边使用了pytorch:2.0.1-CANN6.3.RC2-py39,torch-npu版本为2.0.1rc1,仍然报相同错误

使用CANN7.1.0rc4版本,torch-npu版本为最新的1.11.0,建议使用最近的torch_npu和cann包

您这边是使用专门的镜像么,我这边是通过juypterlab进行操作的,无法自行升级

chekistcccp commented 10 months ago

测试时报错,信息如下,环境与issue#3002一致 In 文件包含在/home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_helper.hpp:26:0, 来自/home/ma- user/work/mmcv/mmcv/ops/csrc/pytorch/npu/chamfer_distance_npu.cpp:1: /home/ma-user/work/mmcv/mmcv/ops/csrc/pytorch/npu/chamfer_distance_npu.cpp:在函数' void chamfer_distance_backward_npu(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor)': /home/ ma- user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:错误:“utils”不是“torch_npu”的成员 :::TensorOptions(torch_npu::utils::get_npu_device_type()) ; ^ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:在宏“EXEC_NPU_CMD”的定义中 :::TensorOptions(torch_npu::utils::get_npu_device_type( )); ^~~~~ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:建议的替代方案: at::TensorOptions(torch_npu::utils::get_npu_device_type( )); ^ /home/ma-user/work/mmcv/mmcv/ops/csrc/common/pytorch_npu_util.hpp:555:40:注意:在宏“EXEC_NPU_CMD”的定义中 :::TensorOptions(torch_npu::utils::get_npu_device_type( ));\

在相同环境下测试仍然出现同样报错

你好,请问这边的环境是什么版本和什么日期的torch_npu?

pytorch版本为1.11.0,CANN版本为6.3.2,python环境为py_3.7,OS euler_2.8.3-aarch64,torch-npu版本为1.11.0.post1.dev20230719

可以升级CANN和torch版本来解决问题

我这边使用了pytorch:2.0.1-CANN6.3.RC2-py39,torch-npu版本为2.0.1rc1,仍然报相同错误

使用CANN7.1.0rc4版本,torch-npu版本为最新的1.11.0,建议使用最近的torch_npu和cann包

目前找不到CANN7.1.0rc4版本,请指下出处