Open ArnaudFickinger opened 1 week ago
Actually, they are useful for training when you train the model in a large-scale distributed system. We place them in the appropriate place to make the distributed training more stable.
If you are training on a small scale, or pre-training with a very robust distributed system, you can try removing them. But these sentences will introduce neglectible overhead.
I noticed that coordinator.block_all(), torch.set_num_threads(1) and dist.barrier() were added to the training script. Were they added for debugging purpose only or are they useful for training?