Open jharrymoore opened 4 months ago
Where did you get these from? Anything with a force >1 was stripped out when we generated the HDF5 file for the dataset.
These came from the latest SPICE v2.0.1 HDF5 file on zenodo, positions and arrays were extracted to xyz
Can you provide the group names and conformation indices so I can look them up?
Just confirming, by group names do you mean the spice subset they belong to.
I mean the name of the top level group within the HDF5 file. So I can look them up in the file.
Attached is a set I extracted from the amino acid-ligand set with a force norm greater than 30 eV/A. Looking at the configs, many of the geometries seem reasonable, however it appears that certain heavy atoms are being replaced with hydrogens
That makes sense. I assumed your file had the same units as the original dataset. The cutoff we applied to forces is 1 hartree/bohr, which is 51.4 eV/Å. Anything less than that is expected to still be present.
I looked through a few of the molecules you listed and didn't see any detached hydrogens like that. But I did see some mangled looking molecules, like this distorted ring in XEN HIS
.
You might choose to apply a lower cutoff to forces to get rid of things like this. Strictly speaking they're still correct: the DFT calculation was run correctly for the given conformations. But you might decide you don't want to train on conformations that are that unrealistic.
This molecule is also strange pubchem_id=135091982
Hi,
Whilst inspecting some of the new subsets that were added in version 2, I came across some configurations where the hydrogens appear to have been ripped off their heavy atoms, and the forces from DFT are extremely high. I have attached some examples that appear when filtering the amino acid-ligand subset by max force. My understanding was that some of these configurations with high forces were present in the original dataset due to the psi4 bug but was not expecting them to be present in the more recently computed values.
spice_2_amino_acid_ligand_high_dft_forces.tar.gz