Closed panyuxin1993 closed 2 months ago
I like the project a lot and want to train model using our own data. However, we do not a powerful computer with so many powerful GPUs. I am wondering whether it is possible to run the training using multiple computers, i.e. a cluster?
You can see https://pytorch.org/tutorials/beginner/dist_overview.html for tips. I have not tried multi-node parallel training.
I like the project a lot and want to train model using our own data. However, we do not a powerful computer with so many powerful GPUs. I am wondering whether it is possible to run the training using multiple computers, i.e. a cluster?