schrodinger / coordgenlibs

Schrodinger-developed 2D Coordinate Generation
BSD 3-Clause "New" or "Revised" License
42 stars 28 forks source link

Stop force field minimization when energy improvement slows down #95

Closed rachelnwalker closed 3 years ago

rachelnwalker commented 3 years ago

When tracking the energy after each force field minimization step, we noticed that the energy tends to 'converge' far before hitting 1000 iterations:

all_mols

For many of these molecules, there is still a small energy decrease over every few iterations after converging (however the energy levels tend to oscillate). I started looking into how small changes in energy (~10-50) impact coordinates, and it does not seem to make a huge visual difference. For example, this is the energy after each iteration for one molecule:

Screen Shot 2021-04-28 at 2 24 11 PM

And here is a short animation that shows how the coordinates change on each iteration: https://user-images.githubusercontent.com/39069546/116486757-33c8b980-a843-11eb-860f-dd467d8b7705.mov

The energy drops by around 50 between iterations 200 and 1000, however most of the significant movements occur in the first 200 iterations. After this commit, the minimization would stop once the net energy decrease in the last ITERATION_HISTORY_SIZE iterations is less than MAX_NET_ENERGY_CHANGE. I currently have these values set to 100 and 20 respectively. This change caused 99% of the minimizations in the first graph to stop early (97% stop before hitting 300 iterations). Here are a few examples of the early stopping -- the grey shows where the energy would go if the minimization continued:

Screen Shot 2021-04-28 at 4 31 31 PM
Finally, this change does give a substantial performance improvement (but is still nowhere near RDKit's coordinate generator). I ran rdDepictor.Compute2DCoords(mol) on 4184 molecules: Total time elapsed Minimizations that took >.1s
Before these changes 24.001s 13
After these changes 10.949s 2
RDKit's native coordinate generator 0.877s 0

These are my main questions now:

  1. Do we think this is a good approach?
  2. Do the current ITERATION_HISTORY_SIZE and MAX_NET_ENERGY_CHANGE values seem reasonable, or should we change them? I chose these arbitrarily, it may make sense to look at a smaller history anytime after ~150-200 iterations.
d-b-w commented 3 years ago

@rachelnwalker - Looks like the Windows build failure is real. Probably something like a missing static cast (Windows is pickier about that than the other platforms). Lemme know if you want to talk about where to find the logs.

rachelnwalker commented 3 years ago

@d-b-w just fixed the windows build :)