proteneer / timemachine

Differentiate all the things!
Other
138 stars 17 forks source link

Speed up energy decomposition #1323

Closed badisa closed 2 months ago

badisa commented 2 months ago

Noticed that it could take awhile for simulations to 'finish' after HREX had finished and upon investigation it looks like we are spending a lot of time reading frames from disk. And due the way the loop was constructed for energy composition we were reading the frames into memory repeatedly.

This only makes a noticeable difference for HREX ( <=~1% for bisection/sequential), which is because there are many more disk reads due to only writing a single frame to disk in the StoredArrays per chunk, unlike sequential/bisection which writes chunks of 100 frames to disk. We probably want to resolve that in a subsequent PR.

Benchmarks

Cuda Arch 8.6 Nvidia A10

Vacuum

HREX now gets faster with the more windows :rofl:, shows about a 20% speed up

image

Speed up relative to master

image

Solvent

image

Speed up relative to master

Up to 5%

image

Complex

image

Speed up relative to master

image

mcwitt commented 2 months ago

Are the ns/day figures strongly dependent on the number of frames we run? (I think this would be the case for changes that only affect the final analysis stage?) It might be clearer to instead report the amount of time spent on the final analysis?

Edit: scratch that, this makes more sense than I thought at first because the time to do the analysis presumably also scales linearly with number of frames

badisa commented 2 months ago

Updated the description to detail all of the benchmarks with the relative speed ups of this change. Most substantial for vacuum simulations, but does have a couple of percent improvement for HREX in solvent/complex.

https://github.com/proteneer/timemachine/pull/1323#issue-2336861819