Closed raimis closed 2 years ago
@peastman could you review?
The PBC (https://github.com/torchmd/torchmd-net/issues/92) will be add with a separate PR.
My comments at https://github.com/torchmd/torchmd-net/pull/61#issuecomment-1108854574 still apply. For very small molecules, this should be as fast as anything. Once you get up to several hundred atoms, there are ways it could be made faster. Having each thread loop over multiple atoms or having multiple threads work together could reduce the number of row/column computations, position loads, and atomic adds.
For molecules of 100-200 atoms, the speeds improve with more parallelization exposed, even it means some duplicate calculations. For larger molecules, maybe it is possible to improve a bit as you suggest, but the scaling of the current algorithm is quadratic. So it won't go far.
I thing, a better approach is two write another neighbor kernel with the cell list (some preliminary work https://github.com/torchmd/torchmd-net/pull/68), which will be more efficient for large molecule and leave this one optimized for small ones.
@peastman any more comments?
Looks good.