pytorch / torchtune

PyTorch native finetuning library
https://pytorch.org/torchtune/main/
BSD 3-Clause "New" or "Revised" License
4.35k stars 441 forks source link

Add Ascend NPU as a backend #1797

Open noemotiovon opened 1 month ago

noemotiovon commented 1 month ago

Description

:rocket:Ascend is a full-stack AI computing infrastructure for industry applications and services based on Huawei Ascend processors and software. For more information about Ascend, see Ascend Community.

CANN (Compute Architecture of Neural Networks), developped by Huawei, is a heterogeneous computing architecture for AI.

Pytorch has officially announced support for Ascend NPU (through key PrivateUse1), please see the PrivateUse1 tutorial here.


Motivation

With the growing number of developers leveraging Ascend NPUs for AI training and inference, I would like to propose adding support for the Ascend NPU backend to this project.


ebsmothers commented 1 month ago

Hi @noemotiovon thanks for creating the issue. If I understand correctly one gap is to import torch_npu to access the NPU backend, is that correct? Looking at the Ascend/pytorch repo I only see up to version 2.3 supported on main (ref), while we only support the latest stable version of PyTorch (2.4 and soon to be 2.5). Do you know if these more recent versions are currently support in Ascend?

noemotiovon commented 1 month ago

Hi @noemotiovon thanks for creating the issue. If I understand correctly one gap is to import torch_npu to access the NPU backend, is that correct? Looking at the Ascend/pytorch repo I only see up to version 2.3 supported on main (ref), while we only support the latest stable version of PyTorch (2.4 and soon to be 2.5). Do you know if these more recent versions are currently support in Ascend?

@ebsmothers Thank you very much for taking the time to review my issue :smile: ! Currently, torch-npu 2.4.0rc1 has been released, and we can use it for testing.(ref) You’re correct, we just need to import torch_npu based on the device. And will you accept a PR that use this version(2.4.0rc1) of torch_npu for testing and verification? :smiley:

ebsmothers commented 1 month ago

@noemotiovon yep if you open a PR I am happy to review it!

RdoubleA commented 1 month ago

fyi looks like this has been requested in the past, but never followed up on: https://github.com/pytorch/torchtune/issues/1006