proteneer / timemachine

Differentiate all the things!
Other
138 stars 17 forks source link

Add lambda schedule tuned to Bisection #1347

Closed badisa closed 3 months ago

badisa commented 3 months ago

For vacuum it makes very little different to minimize more states than required (2 * (num_windows - 2)), but in solvent/complex it can get expensive. Seen in 30k atom systems that minimizing (local minimization) a new window takes 20-30 seconds which is ~15 minutes or so (for num windows = 48) spent minimizing states that aren't actually used.

Master PR
vacuum_100_benchmarks vacuum_100_benchmarks_bisection
solvent_100_benchmarks solvent_100_benchmarks_bisection
complex_100_benchmarks complex_100_benchmarks_bisection

The performance improvement with this change is about 10% (vacuum) to 20% (solvent/complex). Though it is worth noting that the performance improvement is related to the number of windows needed and what the maximum number of windows are. The performance gain will be O(1) and not scale with the number of frames, the benchmarks are done here with 100 frames to match the bisection stage of HREX/Bisection.

Note that the bisection timings after this change are noisier since there are still new initial states that have to be minimized depending on schedule generated bisection. This should be strictly faster than the previous implementation (where num windows are even).

mcwitt commented 3 months ago

we could also change the equality to np.allclose

I'd be in favor of switching to np.allclose on lambda for determining whether two states are equivalent. This seems safe as long as the parameter interpolation is continuous, which it should be.

The one difference is that this will always fill in the 'smallest' gap, which will go from the start of the array and bias towards having more evenly spaced windows. Which is why the existing code.

I'm not sure I understand the motivation here. If we assume that the parameter interpolation is close to optimal, then bisection will achieve equal spacing when the number of windows is $2^n + 1$ for some integer $n$. But for numbers of windows in between, the order in which we bisect the pairs is arbitrary, determined by details of the transformation and/or sampling noise. From this perspective, it seems simplest to me to choose for the size of the lambda grid the smallest $2^n + 1$ larger than the number of desired windows, or the largest $2^n + 1$ less than the number of desired windows and randomly bisect some pairs to make up the difference.

maxentile commented 3 months ago

I'm not sure I understand the motivation here. If we assume that the parameter interpolation is close to optimal, then bisection will achieve equal spacing when the number of windows is 2^n + 1 for some integer n. But for numbers of windows in between, the order in which we bisect the pairs is arbitrary, determined by details of the transformation and/or sampling noise.

Similar reservation

badisa commented 3 months ago

Closing this, seems to cause issues in an edge in hif2a (complex leg is off by 3kcal from without this change) and until I understand why exactly going to close.

badisa commented 3 months ago

After running the hif2a edge (15 -> 30) five times on https://github.com/proteneer/timemachine/commit/30e2a506f8fdda10a64af91aea40a1f7d6678774 vs 350ae7ac9f820dfaf9479a9533b5efab40fb8969 the large run to run variance was not a result of this PR, rather just got unlucky.

Still tracking down a way to identify what exactly is triggering this variance, but don't think it needs to delay this PR.

The five runs of the edge (complex) image