Performance improvements for large compounds

chrisiacovella commented 1 year ago

When constructing a Compound out of many smaller pieces (using the Compound.add function) the performance of this operation can get very slow as the size of the Compound grows.

For example, if I were to construct a system of water by individually adding one water at a time to the system, the time it takes to add in a new water molecule will begin to increase substantially. In the code below, I print out the time it takes to add 500 water molecules as the system size grows; the time it takes to add 500 particles increases by a factor of ~50 comparing the beginning to the end.

n waters time (s) 500 0.636166 1000 9.599990 1500 7.836106 2000 14.595778 2500 14.984029 3000 17.309123 3500 23.607873 4000 23.872859 4500 32.802591

import time
import mbuild as mb
from mbuild.lib.molecules.water import WaterSPC

water_template = WaterSPC()
n_water = 5000
start_time = time.time()
system = mb.Compound()
system_timing = []
check_interval = 500
start_time = time.time()

for j in range(0, n_water):
    temp_water = mb.clone(water_template)
    temp_water.translate_to(np.random.rand(3)*10.0)

    system.add(temp_water)
    if j% check_interval == 0 and j > 0:
        current_time = time.time()
        system_timing.append([j, current_time-start_time])
        start_time = time.time()

This slowdown impacts loading functions as well, as these all utilize the add function to set up the Compound (overhead ends up being so substantial that I couldn't load in a 40K particle mol2 file).

We could speed this up by creating an intermediate Compound with, e.g., 100 water molecules, and then adding this Compound to the system. The timings below are for adding in chunks of 100 water molecules, where we see the time to add the waters is effectively independent of system size. While this speeds things up, it adds additional levels to the tree which may be undesirable.

n waters time (s) 500 0.276973 1000 0.241972 1500 0.305013 2000 0.343213 2500 0.333969 3000 0.334511 3500 0.372732 4000 0.351961 4500 0.328118

The cause of this appears to be related to the compose operation for the bond graph. Each time we add a new particle, we need to merge the child compound's bond graph into the bond graph of the "parent" Compound (where child is what we are adding and parent is the compound we are adding to). As the size of the parent bond graph grows, this operation gets slower. The reason adding in smaller chunks into an intermediate compound speeds things up is because most of our compose operations are between two relatively small bond graphs and we minimize the number of compose operations between a large bond graph and small bond graph. Even relatively small groupings (e.g., 5 compounds), can provide substantial speed up as the system size grows, although a size of 100 seems to be pretty optimal.

To avoid introducing unnecessary levels in the hierarchy, we could modify the add function such that, if an array of compounds is given to add, it will first compose those child bond graphs, then do a compose them with the parent Compound, before doing the rest of the operations in the add function. I did a quick implementation of this and the speed improvement is great for the water system. The timings below are for different "chunk" sizes looking at the time for adding the first 500 and last 500 molecules. I'm not sure what is up with the chunk size 500 (initial time is wonky). chunks of 100 seem optimal, at least for water (I assume this would depend on the size of the underlying graph).

chunk time for first time for last 1 0.6414 34.7333 2 0.3253 17.0081 5 0.1655 8.4460 10 0.1279 1.0069 100 0.1641 0.2509 500 6.4974 0.5539 1000 0.0614 1.8385

It should be pretty straight forward to modify any of the loading functions (or similar) to perform operations on chunks. I'm not sure if requiring users to break things up into chunks is optimal, but I'll note this only really impacts adding in lots of smaller compounds into a large system size, so it might not come up a ton (especially if using other operations that have been accelerated behind the scenes). It would be trivial to create a helper class to stage compound construction (that is, rather than requiring the user to create temporary lists and if statements and such). e.g., create a class called "LargeMoleculeStaging", to help facilitate this. This class could take as a constructor the "parent" compound. A "stage" function, that just takes the child compounds we want to add and append them to a list in the class. And a "add_to_compound" function that breaks the list up into chunks and calls the Compound add function. That would simplify the logic a user is required to implement and I feel lit would be pretty easy to explain the usage in documentation. This isn't entirely different than how the polymer class operates now.

I'll work on a PR for changing the Compound.add function and modifying the loading functions.

chrisiacovella commented 1 year ago

While there may be some optimization that can be done in terms of the size of the lists, the time per particle does not seem to strong depend upon this choice in the new implementation I put together that uses nx.compose_all.

For example, I built different sized systems where the size of the chunk matches the system size (i.e., only doing one "add" call per system of the entire list), and the total time normalized by number of waters is effectively independent up to 500K water molecules (the times seem to be within the variation I see if I run this multiple times on my machine). This is good news as it means it isn't necessity to make helper functions or to break up larger lists. It seems for all practical limits, users can just make a list of compounds and add it the compound and get all the speed improvement.

Note, for comparison the total time for 100K and 500K water molecules was 18 and 132 seconds respectively in the data below; the original code took 87 seconds for just 5000 water molecules.

N total time/N 1000 0.00020535612106323243 1500 0.00013843854268391926 2000 0.00013179647922515868 2500 0.00013567438125610353 5000 0.00046730399131774905 10000 0.00017840790748596192 50000 0.00016421604156494142 100000 0.00017153034925460815 500000 0.0002642585263252258

chrisiacovella commented 1 year ago

Preliminary timings related to loading mol2 files (converting from mdtraj trajectory to Compound):

N water	old (s)	new (s)
1000	4.92	0.07
5000	266.1	0.7
10000	N/A	0.9

mosdef-hub / mbuild

Performance improvements for large compounds #1104