openmm / spice-dataset

A collection of QM data for training potential functions
MIT License
147 stars 9 forks source link

unit system? #33

Closed yuanqing-wang closed 2 years ago

yuanqing-wang commented 2 years ago

What are the units for the energies and coordinates? Some documentation might be nice and we can also annotate in the hdf5.

raimis commented 2 years ago

Atomic units, i.e. positions -- Bohr, energy -- Hartree, gradients -- Bohr/Hartree

jchodera commented 2 years ago

Can we use the HDF5 variable attributes to annotate the units, so that the HDF5 file is self-documenting in terms of what units the various properties are in?

peastman commented 2 years ago

The units are documented at https://github.com/openmm/spice-dataset/tree/main/downloader. I also still need to write the main README, and I'll include the information there as well.

jchodera commented 2 years ago

Could we also include the units in the HDF5 file in addition to adding them to the README? This is really what HDF5 variable attributes are designed for.

peastman commented 2 years ago

I don't think qcportal reports what the units are for a field, so we would need to figure them all out and hardcode them in the script. Here is the full list of fields you can request:

2-BODY DISPERSION CORRECTION ENERGY CURRENT DIPOLE X CURRENT DIPOLE Y CURRENT DIPOLE Z CURRENT ENERGY CURRENT REFERENCE ENERGY DFT FUNCTIONAL TOTAL ENERGY DFT TOTAL ENERGY DFT VV10 ENERGY DFT XC ENERGY DISPERSION CORRECTION ENERGY GRID ELECTRONS ALPHA GRID ELECTRONS BETA GRID ELECTRONS TOTAL NUCLEAR REPULSION ENERGY ONE-ELECTRON ENERGY PCM POLARIZATION ENERGY PE ENERGY SCF DIPOLE X SCF DIPOLE Y SCF DIPOLE Z SCF ITERATION ENERGY SCF ITERATIONS SCF QUADRUPOLE XX SCF QUADRUPOLE XY SCF QUADRUPOLE XZ SCF QUADRUPOLE YY SCF QUADRUPOLE YZ SCF QUADRUPOLE ZZ SCF TOTAL ENERGY TWO-ELECTRON ENERGY WB97M-D3BJ DISPERSION CORRECTION ENERGY XC GRID RADIAL POINTS XC GRID SPHERICAL POINTS XC GRID TOTAL POINTS -D GRADIENT 2-BODY DISPERSION CORRECTION GRADIENT CURRENT DIPOLE CURRENT GRADIENT DFT TOTAL GRADIENT DISPERSION CORRECTION GRADIENT MAYER INDICES MBIS CHARGES MBIS DIPOLES MBIS OCTUPOLES MBIS QUADRUPOLES MBIS RADIAL MOMENTS <R^2> MBIS RADIAL MOMENTS <R^3> MBIS RADIAL MOMENTS <R^4> MBIS VALENCE WIDTHS SCF DIPOLE SCF QUADRUPOLE SCF TOTAL GRADIENT WB97M-D3BJ DISPERSION CORRECTION GRADIENT WIBERG LOWDIN INDICES MAYER_INDICES WIBERG_LOWDIN_INDICES

However, we probably don't need to worry about all of them! For example, maybe only add units for the ones we list in the default config file? That's these ones:

DFT TOTAL ENERGY DFT TOTAL GRADIENT MBIS CHARGES MBIS DIPOLES MBIS QUADRUPOLES MBIS OCTUPOLES SCF DIPOLE SCF QUADRUPOLE WIBERG LOWDIN INDICES MAYER INDICES

loriab commented 2 years ago

fwiw, QCSchema policy is to always store properties in atomic units, even the odd stuff like vibrational frequencies. pint is set up in qcelemental and attached to Datum objects, with the idea that properties could use that on the schema data, then run .to_units("everyday_non_au_unit") on them and get workaday values while still leaving harvested in AU.

The list of QCVariables posted just above is roughly Psi4's domain (though we try to cross-list with QCSchema AtomicResult.properties wherever possible). Units on those documented here https://psicode.org/psi4manual/master/glossary_psivariables.html . We're trying to move to entirely atomic units, but that's WIP. For example, SCF DIPOLE X was Debye, but that's been retired in favor of vector SCF DIPOLE in e a0.

peastman commented 2 years ago

Thanks! That's exactly the information we needed.

jchodera commented 1 year ago

Lori: Thanks!

Do you happen to know the QCElemental or pint compatible unit names for these?