huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.97k stars 970 forks source link

add use_all_gather for option #3164

Open SangbumChoi opened 1 month ago

SangbumChoi commented 1 month ago

What does this PR do?

Fixes https://github.com/huggingface/accelerate/issues/3147 @muellerzr

I think testing might need but not sure where to add this corresponding test

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.

SangbumChoi commented 1 month ago

Also have additional question for LLM inference evaluation process. Does LLM evaluation process requires to have all_gather function to make all the prediction in all processes?

muellerzr commented 1 month ago

Thanks! Tests should be in src/accelerate/test_utils/scripts/test_ops.py

The all_gather is so that we can get the results across all processes. It's there as an example and for simplicity for the user. (But it is not generally required, no)

SangbumChoi commented 1 month ago

@muellerzr Hi I haved added test. However, to be more strict test I think we can add additional checking when the process is not in the state.is_main_process and check the return is empty list. Let me know how do you think about this.

SangbumChoi commented 1 month ago

@muellerzr Gentle Ping!

muellerzr commented 3 weeks ago

re; test:I think it's fine how it is right now