wait_for_everyone() did not work

System Info

all is the latest

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
[X] My own task or dataset (give details below)

Reproduction

I have same code on different two servers where the only difference resources are gpu : (A server) A6000 x 8 and (B server) RTX 3090 x 8. (Note that all versions are same cause same install package scripts are used)

When I operated same code, A server works very well with no stuck but B server stucks and freezes and cannot pass it at the right time accelerator.wait_for_everyone() is encountered. (How can guarantee the codes are same? this is because I've controlled the two server by git source, therefore I can always check the difference in vs code github tracking function)

What may be the candidates of the reasons for this problems?

I used DDP.

Is it hardware problem? how to solve...

Expected behavior

I want to run the code in B server

huggingface / accelerate