openmm / spice-dataset

A collection of QM data for training potential functions
MIT License
133 stars 6 forks source link

Water clusters #70

Closed peastman closed 12 months ago

peastman commented 1 year ago

For this subset, I simulated a box of AMOEBA water for 10 ns. Every 10 ps, I saved the positions of the 30 water molecules closest to the center of the box. This produces 1000 clusters of 30 molecules that should be representative of bulk water at room temperature.

jchodera commented 1 year ago

Is 1000 configurations sufficient for a cluster of 30 molecules? Presumably this would have to be combined with information on smaller clusters to be able to uniquely determine a potential energy function for water---do we plan on adding clusters of other smaller sizes?

Also, if the AMOEBA potential energy surface is much different from the QM level of theory we use, do we think it would be useful to include a very small fixed number of steps of gradient descent for each configuration when generating data so we also sample snapshots closer to equilibrium for the QM potential? I know this was not something that our initial QCFractal workflows supported, but we may have the opportunity to revise these now.

peastman commented 1 year ago

I believe this should be plenty. Consider that it includes 30,000 individual water conformations and an even larger number of interacting water-water pairs.

I discussed appropriate models with @leeping before running this, including the possibility of running some dynamics with a QM method for each cluster (not gradient descent—we want a thermal distribution, not local minima). His advice was that it would be very expensive and wouldn't necessarily give a better distribution.

jchodera commented 1 year ago

we want a thermal distribution, not local minima

Perturbing the potential will generally have the effect of shifting the configuration up the walls of the energy landscape. A few steps of gradient descent simply aims to ensure the region around thermal energy is adequately represented should this occur. I agree thermalized dynamics is too expensive.

For example, sampling 1000 snapshots from a flexible TIP3P and then switching to SPCE would likely result in a very poor representation of thermalized SPCE configurations.

jchodera commented 12 months ago

@peastman : I just realized the snapshot generation script omitted a barostat, which could result in highly atypical densities unrepresentative of bulk water. Would it be possible to regenerate this with a MonteCarloBarostat at 1 atm?

peastman commented 12 months ago

Sure, I'll do that.