Open jchodera opened 1 year ago
OpenMM's infrastructure for parallel execution can in principle be applied to any Force. Internally it creates a separate ComputeContext for each device, and a separate copy of the KernelImpl for each one. All of them get executed in parallel, and any energies and forces they return are summed.
The challenge is figuring out what each of those KernelImpl's should do when it gets invoked. For many Forces this is simple. With most bonded forces, we can just divide up the bonds between GPUs, with each one computing a different subset. NonbondedForce is a bit more complicated, but we have ways of doing it.
What would TorchForce do? It doesn't know anything about the internal structure of the model. It just gets invoked once, taking all coordinates as inputs and producing the total energy as output. So the division of work would have to be done inside the model itself. We could pass in a pair of integers telling it how many devices it was executing on, and the index of the current device. The model would have to decide what to do with those inputs such that each device would do a similar amount of work, and the total energy would add up to the correct amount.
Perhaps this would be something for NNPOps. We could provide there drop-in implementations of selected models that would be multi-GPU aware. This would need to be done on a model-by-model basis. I will leave this here for reference: https://pytorch.org/docs/stable/generated/torch.nn.DataParallel.html#torch.nn.DataParallel https://pytorch.org/docs/stable/multiprocessing.html
Hi, I was wondering is there a way to run REMD (ReplicaExchangeSampler) with torchForce with multi GPU?
It should work exactly like any other force. Replica exchange is implemented at a higher level, using multiple Contexts for the replicas. It doesn't care how the forces in each Context are computed.
Oh nice! Could you provide a simple example for how to do this? I came across this issue https://github.com/choderalab/openmmtools/issues/648, but could not figure out how to do it exactly.
I suggest asking on the openmmtools repo. The question isn't related to this package.
Ok, I will do that. Thank you!
Message Passing GNN is still a difficult problem for multi-GPUs MD, we need exchage ghost node's feature between interaction layers
How can we best support parallelization of ML potentials across GPUs?
We're dealing with models that are small enough to be replicated on each GPU, and only O(N) data (positions, box vectors) needs to be sent and O(N) data (forces) accumulated. Models like ANI should be trivially parallelizable across atoms.