openmm / spice-dataset

A collection of QM data for training potential functions
MIT License
146 stars 8 forks source link

Large DFT forces and strange geometries #105

Open jharrymoore opened 1 month ago

jharrymoore commented 1 month ago

Hi,

Whilst inspecting some of the new subsets that were added in version 2, I came across some configurations where the hydrogens appear to have been ripped off their heavy atoms, and the forces from DFT are extremely high. I have attached some examples that appear when filtering the amino acid-ligand subset by max force. My understanding was that some of these configurations with high forces were present in the original dataset due to the psi4 bug but was not expecting them to be present in the more recently computed values.

spice_2_amino_acid_ligand_high_dft_forces.tar.gz

peastman commented 1 month ago

Where did you get these from? Anything with a force >1 was stripped out when we generated the HDF5 file for the dataset.

jharrymoore commented 1 month ago

These came from the latest SPICE v2.0.1 HDF5 file on zenodo, positions and arrays were extracted to xyz

peastman commented 1 month ago

Can you provide the group names and conformation indices so I can look them up?

peastman commented 1 month ago

Just confirming, by group names do you mean the spice subset they belong to.

I mean the name of the top level group within the HDF5 file. So I can look them up in the file.

jharrymoore commented 1 month ago

Attached is a set I extracted from the amino acid-ligand set with a force norm greater than 30 eV/A. Looking at the configs, many of the geometries seem reasonable, however it appears that certain heavy atoms are being replaced with hydrogens

image

high_force_configs_aa_ligand.txt

peastman commented 1 month ago

That makes sense. I assumed your file had the same units as the original dataset. The cutoff we applied to forces is 1 hartree/bohr, which is 51.4 eV/Å. Anything less than that is expected to still be present.

I looked through a few of the molecules you listed and didn't see any detached hydrogens like that. But I did see some mangled looking molecules, like this distorted ring in XEN HIS.

image

You might choose to apply a lower cutoff to forces to get rid of things like this. Strictly speaking they're still correct: the DFT calculation was run correctly for the given conformations. But you might decide you don't want to train on conformations that are that unrealistic.

tamaswells commented 1 day ago

This molecule is also strange pubchem_id=135091982 image