pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
81.53k stars 21.88k forks source link

[RFC] Intel GPU Upstreaming #114723

Open EikanWang opened 8 months ago

EikanWang commented 8 months ago

TL;DR

This RFC document aims to propose and discuss the upstreaming of Intel GPU support in PyTorch. Our focus is on leveraging Intel's advancements in GPU technology to enhance PyTorch's performance and versatility. This initiative begins with the torch.compile integration as a primary step and marks a significant stride towards incorporating the Intel GPU as a robust computational backend in PyTorch. The RFC outlines key components and a high-level design strategy for this integration. By aligning with PyTorch 2.5 release goals, we aim to provide Intel GPU as a Beta feature to benefit a wide range of users and applications.

Motivation

Intel GPUs significantly enhance workload performance, showcasing their strong capabilities in processing efficiency. We have obtained promising performance in Intel® Extension for PyTorch. Therefore, we upstream features and optimizations buffered in IPEX to the stock PyTorch. This will facilitate the out-of-box experience on the Intel GPU platform for users and benefit the PyTorch community.

Approach

Eventually, we will fully support Intel GPU in PyTorch for both torch.compile mode and eager mode. From an execution perspective, we will gradually achieve this goal starting with torch.compile as the initial step to align with the PyTorch 2.5 release as a Beta feature. The functionality and performance maturity will be driven by the dynamo benchmarks – HF, TIMM and TorchBench. Data-types-wise, we will support FP32, TF32, BF16, and FP16 first. Regarding other data types like INT8 and FP8, it is not within the scope of PyTorch 2.5. And we will support all these data types gradually.

In addition, we have added a dedicated dispatch key and device name to PyTorch for Intel GPU that can be found at PyTorch GitHub. Regarding the components or features that we will upstream to the stock Pytorch for Intel GPU, they will be based on the “XPU” device tag.

In summary, the scope of the PyTorch 2.5 release for Intel GPU is as follows:

Components

Since we are taking torch.compile as the initial step to align with the PyTorch 2.5 release, we have identified the Minimum Viable Product (MVP) set. It contains five crucial components as follows:

Besides the five above crucial components, we will rely on the Intel GPU driver and SYCL to implement the Intel GPU runtime and necessary native aten operations.

Design

In this section, we present a high-level design for each component. Regarding the detailed design, please refer to the dedicated RFC for each component for more information.

For a more comprehensive and detailed understanding of each component's design, we highly encourage you to explore the respective RFCs linked above. These documents provide in-depth insight and technical specifics that are crucial for a complete grasp of the proposed implementations and integrations.

Tasks

A more detailed task list is WIP.

### Intel GPU Runtime
- [x] oneAPI BaseToolkit Integration
- [x] `Device` for Intel GPU
- [x] `Stream` for Intel GPU
- [x] `Event` for Inel GPU
- [x] `Allocator` for Intel GPU
- [x] `Guard` for Intel GPU
- [x] Random Generator
### Necessary Native Aten Operation Support
- [x] Integrate XPU OPs as the third-party
- [x] SYCL Compiler Host/Device Separate Compilation
- [x] ATen Operations(Incremental): Elementwise
- [x] ATen Operations(Incremental): Reduction
- [x] ATen Operations(Incremental): Concat, Sort, Arange and Indexing
- [x] Dynamo HuggingFace Benchmark
- [x] Dynamo TIMM Benchmark
- [x] Dynamo TorchBench Benchmark
### OneDNN Library Integration
- [x] oneDNN Library for Intel GPU Integration
- [x] ATen Operations: Conv
- [x] ATen Operations: GEMM
- [ ] ATen Operations: GEMM-Fused Operations
- [ ] ATen Operations: Conv-Fused Operations
### Intel GPU Backend for Inductor
- [x] Python Wrapper Code Generation for Intel GPU
- [x] Intel GPU Backend on Top of Triton for Kernel Code Generation
### CI/CD for Intel GPU
- [x] Self-hosted Runner Hosted in Intel Developer Cloud to Be Available in PyTorch
- [x] AWS-Docker-Based CI/CD Build Task Available for Intel GPU
- [x] CI/CD Test Task Avaiable for Intel GPU

Additional context

This RFC primarily concentrates on enabling Intel GPU support for torch.compile. Additionally, we are evaluating the possibility of extending this support to eager mode through torch.compile as well. Please refer to #115545.

cc @frank-wei @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @ezyang @msaroufim @wconstab @bdhirsh @anijain2305 @zou3519 @voznesenskym @penguinwu @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov @ColinPeppler

cpuhrsch commented 8 months ago

Adding this for triage review so we can discuss whether we want a new module tag for this work etc.

ezyang commented 7 months ago

What was the conclusion of the triage review discussion?

EikanWang commented 7 months ago

@ezyang we've proceeded by proposing detailed designs for each component individually. These proposals will be illustrated through pull requests (PRs), allowing us to effectively demonstrate our ideas. And we can refine PRs directly if reviewers have any comments. Our approach primarily focuses on maximizing the reuse of existing PyTorch code and designs.

From the execution perspective, the Intel GPU runtime is the prerequisite component to enable other components. So, it would be appreciated if you could help review the Intel GPU runtime PR first. As long as the Intel GPU Runtime PRs are landed, we will prioritize the PRs landing of other components. Before that, we will submit the PRs for other components for review first.

Additionally, we have developed a comprehensive roadmap aimed at aligning our efforts with the PyTorch 2.5 release timeline, positioning these features as experimental in this version. This roadmap has been reviewed and discussed with Nikita and Chris to ensure a cohesive understanding and approach.

I'll be sharing this roadmap with you on Slack for your reference and further input. If there are any aspects of your inquiry that I may have missed or if you need further clarification on any point, please feel free to let me know.

EikanWang commented 7 months ago

Adding this for triage review so we can discuss whether we want a new module tag for this work etc.

@cpuhrsch , may I know if we can add a new module tag now to triage review and on-call?

louie-tsai commented 3 months ago

@aice-support