Closed edwag closed 1 year ago
This implementation speeds it up by a constant factor, but it still scales as O(M*N) where M is the number of new atoms and N is the number of total atoms. The implementation suggested in #264 used a kd-tree to improve the scaling to O(M*log(N)). There's also a CellList class in Modeller we could use to improve it to O(M). For large models, that should be a lot faster.
This implementation also uses a lot of memory. If M=1000 and N=100,000, your differences
and squaredDifferences
arrays will each take 1.2 GB. That could be limiting for large models.
Let me see if I can come up with something using CellList.
Can you take a look at #268 and see if it looks ok to you? How is the performance on your model?
Thanks for such a fast and thoughtful response to this PR, and thanks also for your _CellList
implementation! #268 looks good to me and I've left a couple of comments on that PR. In my application, performance is massively improved over the old implementation, though I haven't properly benchmarked it so I can't give a quantitative answer.
Closing because #264 is better addressed by #268
This pull request relates to #264
I've put together a faster implementation of the
_findNearestDistance
method. I haven't tested or benchmarked this thoroughly, but I've measured a 10-100x speedup depending on the input.