huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.32k stars 872 forks source link

Add Profiler Support for Performance Analysis #2883

Closed yhna940 closed 1 day ago

yhna940 commented 1 week ago

What does this PR do?

This PR introduces profiler support to the Accelerate library, enabling users to collect performance metrics during model training and inference. The profiler allows for detailed analysis of execution time and memory consumption of model operators and can generate profiling traces for visualization in Chrome's tracing tool. Additionally, it provides options to profile long-running jobs with customizable scheduling options.

profile_export

Key changes include:

Context and Motivation

Other frameworks like MMEngine and PyTorch Lightning offer profiling techniques based on the Torch Profiler. Inspired by these tools, we aimed to bring similar profiling capabilities to Accelerate. This enhancement helps users optimize and improve model performance by providing insights into the computational and memory aspects of their models.

Before submitting

HuggingFaceDocBuilderDev commented 1 week ago

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

yhna940 commented 1 week ago

Thanks for implementing this cool feature. At a first glance, the PR already looks super clean, really great job.

I checked the rendered profiler docs and there seems to be an issue with </hfoption>, not sure what's going on there. Could you please check? Edit: It seems you were faster than me :)

I also saw that the dates of the copyright headers you added are outdated, could you please update them to 2024?

Otherwise, this PR looks great to me. I don't have a ton of experience with the profiler -- except that I found that it can be super slow :D. In terms of the design of the feature, it looks quite nice to me, but let's wait for Zach's return to office for a full review.

Thank you for the quick review @BenjaminBossan !

I appreciate you pointing out the issue with the </hfoption> tag in the docs. I've corrected it now. Additionally, I've updated the copyright headers to 2024 as requested.

However, I noticed that one image in the docs is broken. Could you please provide some guidance on how to fix this? 🙏

Regarding the profiler's performance, I'm also interested in investigating the slowdown you mentioned. On my local server, it seems to work fine. Could you provide more details on where you're experiencing the slowness?

Thanks again for your feedback :)

yhna940 commented 1 week ago

Thank you for the review and the feedback! I appreciate the insights and will wait for Zach's full review for any additional comments or suggestions.

I will conduct some experiments to further investigate the performance impacts of with_stack and data processing times.

Additionally, I propose updating the user guide to include tips or warnings about handling large or complex tasks using the PyTorch profiler's API for long-running jobs. As highlighted in the PyTorch official documentation:

PyTorch profiler offers an additional API to handle long-running jobs (such as training loops). Tracing all of the execution can be slow and result in very large trace files. To avoid this, use optional arguments:

  • schedule - specifies a function that takes an integer argument (step number) as an input and returns an action for the profiler, the best way to use this parameter is to use the torch.profiler.schedule helper function that can generate a schedule for you;
  • on_trace_ready - specifies a function that takes a reference to the profiler as an input and is called by the profiler each time the new trace is ready.

By updating the documentation with these details, we can provide more comprehensive guidance to users, helping them to effectively manage performance profiling for their models.

Let me know if there's anything else I should address or if you have further suggestions!

Thanks again for the review :)