Closed chrisiacovella closed 1 year ago
Patch coverage: 97.56
% and project coverage change: +0.10
:tada:
Comparison is base (
1211260
) 89.39% compared to head (76bcdc8
) 89.50%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
timings for loading mol2 files of different size: | N waters | new time (s) | old time (s) |
---|---|---|---|
1000 | 0.347 | 4.92 | |
5000 | 1.46 | 266.1 | |
10000 | 3.02 | too slow |
Speed improvements for converting parmed are effectively the same as for mdtraj. Note, converting from gmso will need modification still, but that calls a function in gmso, so it will need to be a separate gmso PR.
I added in a condense function to Compound. This is similar to flatten, but it adds an intermediate level in the hierarchy based on connectivity. This refers to issue #1108 .
To reiterate the issue, take a compound that is like this:
Compound, 28 particles, 18 bonds, 4 children
├── [C x 1], 1 particles, 0 bonds, 0 children
├── [Compound x 1], 3 particles, 2 bonds, 1 children
│ └── [tip3p x 1], 3 particles, 2 bonds, 3 children
│ ├── [H2 x 1], 1 particles, 1 bonds, 0 children
│ ├── [H3 x 1], 1 particles, 1 bonds, 0 children
│ └── [O1 x 1], 1 particles, 2 bonds, 0 children
└── [Compound x 2], 12 particles, 8 bonds, 4 children
└── [Compound x 4], 3 particles, 2 bonds, 1 children
└── [tip3p x 1], 3 particles, 2 bonds, 3 children
├── [H2 x 1], 1 particles, 1 bonds, 0 children
├── [H3 x 1], 1 particles, 1 bonds, 0 children
└── [O1 x 1], 1 particles, 2 bonds, 0 children
And make it this:
Compound, 28 particles, 18 bonds, 10 children
├── [C x 1], 1 particles, 0 bonds, 0 children
└── [tip3p x 9], 3 particles, 2 bonds, 3 children
├── [H2 x 1], 1 particles, 1 bonds, 0 children
├── [H3 x 1], 1 particles, 1 bonds, 0 children
└── [O1 x 1], 1 particles, 2 bonds, 0 children
I still need to finish adding in tests for this; that will come in the next push.
I addressed all the comments, including making one list_flatten helper function (doesn't add any real overhead).
PR Summary:
This refers to issue #1104 . This PR aims to improve the performance of the Compound.add function and loading routines that rely upon it.
The basic gist, as outlined in the issue above, is that when constructing a compound using the add function, the performance can degrade as the compound grows in size, due to the repeated merging (i.e., composing) of bond_graphs, specifically, merging a small with a large bond graph over and over again. This PR changes the underlying logic such that if a list of Compounds is passed to the add function, it will use the compose_all function to merge these bond_graphs together, before adding to the root Compound (and merging bond_graphs with the root compound). The compose_all function effectively scales with the number of compounds being merged.
This provides substantial speed improvements, as outlined in the issue.
Other additions, Compound.add now accepts a list for the label argument if compounds are provided in a list.
This is still a WIP, as tests need to be added for adding labels via a list, as well as adding in the updated load functions to stash compounds into lists (mdtraj conversion is basically complete and provides substantial speed up).
PR Checklist