openmm / spice-dataset

A collection of QM data for training potential functions
MIT License
152 stars 9 forks source link

What quantities to compute #7

Closed peastman closed 2 years ago

peastman commented 3 years ago

What quantities do we want to compute and include in the dataset? Energies and forces are of course essential, but there are other things we could also include. A good principle is that if it's cheap to compute something, and if it might potentially be useful to someone, we might as well include it. Here is a list of quantities that Psi4 can compute: https://psicode.org/psi4manual/master/oeprop.html. Here are some to consider.

raimis commented 3 years ago

We should save the converged wavefunction.

In the past there were problems saving the wavefunction with Psi4, but hopefully in the latest release it is fixed.

giadefa commented 3 years ago

the wavefunction seems a good idea but is it doable in terms of storage?

On Wed, Sep 22, 2021 at 4:11 PM Raimondas Galvelis @.***> wrote:

We should save the converged wavefunction.

  • If we have the wavefunction, we can relatively cheaply compute any additional electronic properties.
  • If we decide to recompute the dataset with a higher-accuracy method, the current wavefunction could be used as an initial guess to the reduce computational cost of the higher-accuracy method.

In the past there were problems saving the wavefunction with Psi4, but hopefully in the latest release it is fixed.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openmm/qmdataset/issues/7#issuecomment-924970376, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3KUOUX73WGCHAI2NQO4QDUDHPZ3ANCNFSM5EPBWRDQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

raimis commented 3 years ago

Computed benzene with wB97X-D/def2-TZVPPD:

import psi4

psi4.set_memory('32 GB')

benzene = psi4.geometry("""
  H      1.2194     -0.1652      2.1600
  C      0.6825     -0.0924      1.2087
  C     -0.7075     -0.0352      1.1973
  H     -1.2644     -0.0630      2.1393
  C     -1.3898      0.0572     -0.0114
  H     -2.4836      0.1021     -0.0204
  C     -0.6824      0.0925     -1.2088
  H     -1.2194      0.1652     -2.1599
  C      0.7075      0.0352     -1.1973
  H      1.2641      0.0628     -2.1395
  C      1.3899     -0.0572      0.0114
  H      2.4836     -0.1022      0.0205
""")

energy, wfn = psi4.energy('wB97X-D/def2-TZVPPD', molecule=benzene, return_wfn=True)

wfn.to_file('benzene')

The wavefunction size is 8.1 MB.

peastman commented 3 years ago

I don't know for sure what QCArchive can handle, but I suspect that won't be practical. For a molecule that size, the coordinates and forces together take 288 bytes. Adding in a few other values and some metadata brings it up to around 1 KB. Storing the wavefunction increases the storage requirements by 3-4 orders of magnitude!

jchodera commented 3 years ago

@jthorton and @pavankum will have to chime in with which properties are supported by QCEngine/QCFractal/QCArchive and can reasonably be captured.

pavankum commented 3 years ago

Instead of the wavefunction we can save the orbital coefficients and eigenvalues, which are good enough for most properties and also to reconstruct the wavefunction. A "crude" example to restart from orbital coeffs,

import psi4
import numpy as np

psi4.set_memory('32 GB')

benzene = psi4.geometry("""
  H      1.2194     -0.1652      2.1600
  C      0.6825     -0.0924      1.2087
  C     -0.7075     -0.0352      1.1973
  H     -1.2644     -0.0630      2.1393
  C     -1.3898      0.0572     -0.0114
  H     -2.4836      0.1021     -0.0204
  C     -0.6824      0.0925     -1.2088
  H     -1.2194      0.1652     -2.1599
  C      0.7075      0.0352     -1.1973
  H      1.2641      0.0628     -2.1395
  C      1.3899     -0.0572      0.0114
  H      2.4836     -0.1022      0.0205
""")

energy, wfn = psi4.energy('wB97X-D/def2-TZVPPD', molecule=benzene, return_wfn=True)

alpha_orb_coeffs = wfn.Ca().np
eigen_vals = wfn.epsilon_a().np
nalpha = wfn.nalpha()

print("a and b densities same: ", wfn.same_a_b_dens())
print("a and b orbs same: ", wfn.same_a_b_orbs)

Density = np.dot(alpha_orb_coeffs[:, :nalpha], alpha_orb_coeffs[:, :nalpha].T)
print(Density == wfn.Da().np)

# Changing orbitals to orbitals read from file (here, stored in variables)
psi4.core.clean()

new_scf, new_wfn = psi4.energy('hf/def2-tzvppd', molecule=benzene, return_wfn=True)
print(new_wfn.Ca().np == wfn.Ca().np)

# since alpha and beta are similar
new_wfn.Ca().np[:] = alpha_orb_coeffs
new_wfn.epsilon_a().np[:] = eigen_vals

new_wfn.Cb().np[:] = alpha_orb_coeffs
new_wfn.epsilon_b().np[:] = eigen_vals

# writing to the scratch file that psi4 reads if scf_guess was set to READ
my_file=new_wfn.get_scratch_filename(180) + '.npy'
new_wfn.to_file(my_file)

psi4.set_options({'guess': 'read'})
energy = psi4.energy('wb97x-d/def2-TZVPPD', molecule=benzene)

May be @jthorton has a polished way to construct a new wfn object instead of replacing the orb coeffs of another energy calc. Anyways, those orbitals and eigenvalues would be on the order of 10's of kilobytes.

Some properties we would be interested in are wiberg/mayer bond indices, dipole, quadrupole moments (already listed above). ESPs can be built from orbital coefficients after we reconstruct the wavefunction.

giadefa commented 3 years ago

This seems like a good compromise.

On Wed, Sep 22, 2021 at 10:45 PM Pavan Behara @.***> wrote:

Instead of the wavefunction we can save the orbital coefficients and eigenvalues, which are good enough for most properties and also to reconstruct the wavefunction. A "crude" example to restart from orbital coeffs,

import psi4 import numpy as np

psi4.set_memory('32 GB')

benzene = psi4.geometry(""" H 1.2194 -0.1652 2.1600 C 0.6825 -0.0924 1.2087 C -0.7075 -0.0352 1.1973 H -1.2644 -0.0630 2.1393 C -1.3898 0.0572 -0.0114 H -2.4836 0.1021 -0.0204 C -0.6824 0.0925 -1.2088 H -1.2194 0.1652 -2.1599 C 0.7075 0.0352 -1.1973 H 1.2641 0.0628 -2.1395 C 1.3899 -0.0572 0.0114 H 2.4836 -0.1022 0.0205 """)

energy, wfn = psi4.energy('wB97X-D/def2-TZVPPD', molecule=benzene, return_wfn=True)

alpha_orb_coeffs = wfn.Ca().np eigen_vals = wfn.epsilon_a().np nalpha = wfn.nalpha()

print("a and b densities same: ", wfn.same_a_b_dens()) print("a and b orbs same: ", wfn.same_a_b_orbs)

Density = np.dot(alpha_orb_coeffs[:, :nalpha], alpha_orb_coeffs[:, :nalpha].T) print(Density == wfn.Da().np)

Changing orbitals to orbitals read from file (here, stored in variables)

psi4.core.clean()

new_scf, new_wfn = psi4.energy('hf/def2-tzvppd', molecule=benzene, return_wfn=True) print(new_wfn.Ca().np == wfn.Ca().np)

since alpha and beta are similar

new_wfn.Ca().np[:] = alpha_orb_coeffs new_wfn.epsilon_a().np[:] = eigen_vals

new_wfn.Cb().np[:] = alpha_orb_coeffs new_wfn.epsilon_b().np[:] = eigen_vals

writing to the scratch file that psi4 reads if scf_guess was set to READ

my_file=new_wfn.get_scratch_filename(180) + '.npy' new_wfn.to_file(my_file)

psi4.set_options({'guess': 'read'}) energy = psi4.energy('wb97x-d/def2-TZVPPD', molecule=benzene)

May be @jthorton https://github.com/jthorton has a polished way to construct a new wfn object instead of replacing the orb coeffs of another energy calc. Anyways, those orbitals and eigenvalues would be on the order of 10's of kilobytes.

Some properties we would be interested in are wiberg/mayer bond indices, dipole, quadrupole moments (already listed above). ESPs can be built from orbital coefficients after we reconstruct the wavefunction.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/openmm/qmdataset/issues/7#issuecomment-925316521, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3KUOQWZ3BNOJLL2WNEI3TUDI56FANCNFSM5EPBWRDQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

peastman commented 3 years ago

Saving the coefficients isn't a substitute for also computing and storing useful quantities. Even if it only took 1 second to recompute them for each conformation, it would still take weeks for the entire dataset. How about including the following?

DIPOLE QUADRUPOLE WIBERG_LOWDIN_INDICES MAYER_INDICES MBIS_CHARGES

peastman commented 3 years ago

Psi4 also supports Distributed Multipole Analysis, which is another way of computing atomic charges and multipoles. I don't know how it compares to MBIS.

peastman commented 2 years ago

Closing since version 1 is now released.