huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.8k stars 946 forks source link

Pipeline parallelism examples with Pippy fails #3151

Open goelayu opened 1 day ago

goelayu commented 1 day ago

System Info

- `Accelerate` version: 0.35.0.dev0
- Platform: Linux-5.15.0-121-generic-x86_64-with-glibc2.35
- `accelerate` bash location: redacted
- Python version: 3.10.14
- Numpy version: 1.23.5
- PyTorch version (GPU?): 2.4.1+cu121 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- PyTorch MLU available: False
- PyTorch MUSA available: False
- System RAM: 1007.59 GB
- GPU type: NVIDIA H100 PCIe

Information

Tasks

Reproduction

  1. Run the llama.py example in distributed inference folder.
  2. Getting the following error
    torch._dynamo.exc.UserError: Dynamic control flow is not supported at the moment. Please use functorch.experimental.control_flow.cond to explicitly capture the control flow. 
    For more information about this error, see: https://pytorch.org/docs/main/generated/exportdb/index.html#cond-operands

Expected behavior

I have tried using different accelerate launch flags such as --dynamo_use_dynamic, however I am not sure how to fix the above error.

muellerzr commented 15 hours ago

@goelayu can you try upgrading your python version? IIRC that can play a role. (3.12 ideally)