microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.72k stars 2.93k forks source link

Add Ascend backend support #11477

Open wangxiyuan opened 2 years ago

wangxiyuan commented 2 years ago

Is your feature request related to a problem? Please describe.

Base on the processor, Huawei build a series AI related hardwares which shown in blue rectangles. They’re called Atlas. Here I’d like to say more abort Atlas 300. It’s a kind of PCI card and used widely on data/ai process servers. Our develop and test work is base on it as well.

Then, base on the hardware, Ascend ecosystem also provides a software layer called CANN. It’s the yellow rectangles in the picture. CANN provides APIs to help developers quickly build AI applications and services based on the Ascend platform. It’s similar with CUDA in Nvidia ecosystem.

In ONNX case, users need convert it to Ascend model first using a transport tool called ATC . It's a little complex. And sometimes, the performance may be poor or accuracy maybe drops.

It's good that if onnxruntime can support Ascend processor as a backend. If so, users can uses onnxruntime on Ascend processor directly.

image

For software, CANN is the main point that both developer and AI framework should know. Let’s focus on CANN. This is the CANN Technical Stack view in Ascend ecosystem. Last year, my colleague, zhipeng had shared the CANN stack already in the onnx meetup. Well, it was based on CANN 3.0 version which is out of date. The picture here shows the newest version called CANN 5.0. As you can see, there are multi layer in CANN. It contains service layer, compilation layer ,execution layer and the base layer. For example. service layer provide operator library, optimization engine and framework adapter.

In general, developers do not need to know them. You need only focus on Ascend Computing Language, ie ACL. It’s the APIs part to help you control Ascend hardware via CANN.

System information

Describe the solution you'd like

image Currently, if a user want to run onnx model on Ascend hardware, he should first use the model translation tool provided by CANN to translate the model from onnx to ascend . The flow is a little complex. And the translated model may lost some precision, and the performance may poor. Even in some case, the model may can’t work correctly.

To solve the problem, a better way is find a way that onnx model can work on Ascend directly. So In onnxruntime, we’d like to add CANN as a new execution provider. Once it’s done, users can use onnx model on Ascend hardware via onnxruntime. Of cause, we’ll add the related CI as well. For example, we can donate VM resoucres which contains Ascend hardware to the community.

The line below the our roadmap. First we’ll push the basic code to upstream. The end to end flow will be done in it. And the ResNet model should work correctly on CANN EP. At the end of year, we’ll finish all the onnx operator support and make sure all the models in onnx model zoo works well on Ascend.

In the next year, we’ll focus on optimizing work. Like performance improvement and so on

Basing the Execution provider mechanism in onnxruntime. It's easy to integrate Ascend processor as a new EP in onnxruntime.

The new EP can be named as CANN. CANN is the AI-oriented heterogeneous compute architecture in Ascend ecosystem. It provides hierarchical APIs to help users quickly build AI applications. Frankly speaking, it's similar with CUDA in GPU ecosystem.

Additionally, we'll add the CI supports as well. We can donate the VM which supports Ascend processor to onnxruntime CI system. Then the community can keep testing the new EP CANN easily.

We hope that the community can accept this feature request. Wish to get your feedback.

Thanks.

Describe alternatives you've considered Use the library provied by Ascend without using onnxruntime

Additional context Ascend official website CANN

KnightYao commented 2 years ago

you think too much

wangxiyuan commented 2 years ago

you think too much

what do you mean?

wangxiyuan commented 2 years ago

This basic PR is ready for review now.

There are about 10 operators are added in the PR. With this change, the ResNet-v1.12 can runs well on Ascend backend with onnxruntime.

Any committer can take a look? Thanks

I'd like to know what should I do to push it forward.

About the test environment, We can donate ascend based VM to community as well.

johnnynunez commented 11 months ago

how is going on?

FFFrog commented 11 months ago

how is going on?

Hey! Refer to the doc first if you have any questions, please. And CI releated is here