openmm / spice-dataset

A collection of QM data for training potential functions
MIT License
152 stars 9 forks source link

Missing data for mbis_charges/dipoles #116

Open DeNeutoy opened 5 hours ago

DeNeutoy commented 5 hours ago

Hi there, great dataset!

When I was ingesting it, I noticed that there is some missing dipole/charge data. Is this expected?

specifically index 40119.

In [1]: import h5py
In [2]: x = h5py.File("./SPICE-2.0.1.hdf5", "r")
In [6]: for i, a in enumerate(x.values()):
   ...:     if i == 40119:
   ...:         break
   ...:

In [7]: a
Out[7]: <HDF5 group "/54X VAL" (15 members)>

In [8]: a.keys()
Out[8]: <KeysViewHDF5 ['atomic_numbers', 'conformations', 'dft_total_energy', 'dft_total_gradient', 'formation_energy', 'mayer_indices', 'mbis_charges', 'mbis_dipoles', 'mbis_octupoles', 'mbis_quadrupoles', 'scf_dipole', 'scf_quadrupole', 'smiles', 'subset', 'wiberg_lowdin_indices']>

In [9]: a["mbis_dipoles"]
Out[9]: <HDF5 dataset "mbis_dipoles": shape (0,), type "<f4">

In [10]: a["mbis_charges"]
Out[10]: <HDF5 dataset "mbis_charges": shape (0,), type "<f4">

In [11]: a["atomic_numbers"]
Out[11]: <HDF5 dataset "atomic_numbers": shape (49,), type "<i2">
peastman commented 2 hours ago

Duplicate of #48. On some conformations where MBIS failed to converge, we reran them without it.