huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.33k stars 873 forks source link

[RFC]Integrate Intel Extension for Pytorch into accelerate to get out-of-box optimizations on intel platform. #700

Open tangleintel opened 1 year ago

tangleintel commented 1 year ago

Motivation

Intel Extension for Pytorch(a.k.a IPEX) can provides extra optimization and performance boost on intel hardware platform(currently only for CPU) for both inference and training. These optimization include graph level optimization such as operator fusion, auto mixed precision which support rich bf16 operators and optimization for optimizer which boost the training performance. In contrast with trainer, accelerate mostly is used with distributed training and inference on transformer model, but it also can benefit from IPEX's optimization. So, integrate IPEX into accelerate can make users who do distributed training or evaluation get out-of-box performance boost on CPU.

Design

User interface

Implementation

pacman100 commented 1 year ago

Hello @tangleintel, Thank you for the detailed feature request. I've made the above draft PR. I currently don't have access to instance with latest Intel CPUs to perform the initial testing. Could you please try it out on your end and let us know. You can enable ipex via below options:

  1. Through accelerate config as shown below:
    compute_environment: LOCAL_MACHINE
    deepspeed_config: {}
    distributed_type: MULTI_CPU
    downcast_bf16: 'no'
    fsdp_config: {}
    ipex_config:
    ipex_enabled: true
    ipex_fusion_enabled: false
    machine_rank: 0
    main_process_ip: null
    main_process_port: null
    main_training_function: main
    mixed_precision: bf16
    num_machines: 1
    num_processes: 4
    rdzv_backend: static
    same_network: true
    use_cpu: true
  2. Pass --ipex_enabled and --ipex_fusion_enabled to enable the corresponding options while using accelerate launch, e.g., accelerate launch --config_file xxxxx.yaml --ipex_enabled --ipex_fusion_enabled script.py
tangleintel commented 1 year ago

Hi, @pacman100 Thanks for the reply! I think both your suggestions for how to pass the ipex related options is more suitable. I will refine this RFC according to your suggestions. The ETA of this PR expected to be ready is by the end of this month.

pacman100 commented 1 year ago

Hello @tangleintel, I have already implemented the above feature request in the draft PR #701. Please test that out and let us know if it works as expected or refine the draft PR using it as a starting point.

tangleintel commented 1 year ago

@pacman100 Oh, I see. Thanks for the work very much. I will try it out and then give you feedback.

tangleintel commented 1 year ago

Hi, @pacman100 I have tried your patch initially without performance test. For the initial functionality test, I found several little issues:

I will do the perf test for both training & inference(jit & imperative) when multi node machine is available for me. For bf16, we will try to PR to pytorch.

pacman100 commented 1 year ago

Hello @tangleintel, please go ahead and make the necessary changes to the PR as per your above comments. We can review everything once post that.

tangleintel commented 1 year ago

OK

yao-matrix commented 1 year ago

@pacman100 , @jianan-gu, @sywangyi ,w/ IPEX load_state_dict support, do we see it's OK to merge this PR?

yao-matrix commented 1 year ago

@kding1