openforcefield / openff-interchange

A project (and object) for storing, manipulating, and converting molecular mechanics data.
https://docs.openforcefield.org/projects/interchange
MIT License
69 stars 22 forks source link

Issue with getting positions of atoms when more than 99999 atoms in the system #794

Closed taidbui closed 1 year ago

taidbui commented 1 year ago

Description

When there are more than 99999 atoms in a system, the pdb output from packmol will have the second column containing Non-integer serial number. This leads to an issue with extracting positions from the pdb file later on using rdkit.Chem.rdmolfiles.MolFromPDBFile.

Reproduction

from openff.toolkit.topology import Molecule
from openff.toolkit.typing.engines.smirnoff import ForceField
from openff.units import unit
from openff.interchange.components.interchange import Interchange
from openff.interchange.components.mdconfig import MDConfig
from openff.interchange.components._packmol import pack_box

smiles_list = ['CCO']
molecules = [Molecule.from_smiles(smiles, allow_undefined_stereo=True) for smiles in smiles_list]
box_shape = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
topology = pack_box(
    molecules=molecules,
    number_of_copies=[11112],
    box_shape=box_shape,
    solute=None,
    center_solute=True,
    tolerance=2.0 * unit.angstrom,
    mass_density=0.7 * unit.grams / unit.milliliters,
    working_directory='./',
    retain_working_files=True)
topology.to_file('./system.pdb')
print('Done')
# Load a forcefield
forcefield = ForceField('openff-2.1.0.offxml')

# Convert the OpenMM System to an Interchange object
interchange = Interchange.from_smirnoff(force_field=forcefield, topology=topology)
interchange_mdconfig = MDConfig.from_interchange(interchange)
interchange_mdconfig.periodic = True

Output


---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[14], line 14
     12 print('Begin packing molecules...')
     13 box_shape = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]
---> 14 topology = pack_box(
     15     molecules=molecules,
     16     number_of_copies=[11112],
     17     box_shape=box_shape,
     18     solute=None,
     19     center_solute=True,
     20     tolerance=2.0 * unit.angstrom,
     21     mass_density=0.7 * unit.grams / unit.milliliters,
     22     working_directory='[./](https://vscode-remote+ssh-002dremote-002bbp-005fcompute.vscode-resource.vscode-cdn.net/lustre03/vol0/butm98/projects/PSS_Gel_formation_degradation/simulations/)',
     23     retain_working_files=True)
     24 topology.to_file('[./system.pdb](https://vscode-remote+ssh-002dremote-002bbp-005fcompute.vscode-resource.vscode-cdn.net/lustre03/vol0/butm98/projects/PSS_Gel_formation_degradation/simulations/system.pdb)')
     25 print('Done')

File /hpcdata/butm98/pyhpc/conda/x86_64/envs/openff/lib/python3.10/site-packages/openff/utilities/utilities.py:80, in requires_package..inner_decorator..wrapper(*args, **kwargs)
     77 except Exception as e:
     78     raise e
---> 80 return function(*args, **kwargs)

File /hpcdata/butm98/pyhpc/conda/x86_64/envs/openff/lib/python3.10/site-packages/openff/interchange/components/_packmol.py:718, in pack_box(molecules, number_of_copies, solute, tolerance, box_vectors, mass_density, box_shape, center_solute, working_directory, retain_working_files)
    710     # Load the coordinates from the PDB file with RDKit (because its already
...
    720 # TODO: This currently does not run if we encountered an error in the
    721 # context manager
    722 if temporary_directory and not retain_working_files:

AttributeError: 'NoneType' object has no attribute 'GetConformers'

Software versions

mattwthompson commented 1 year ago

Nice find.

Since that particular line is just used to load the coordinates, I wonder if we could use another tool like MDTraj in an except: block.

This diff appears to fix your MRE:


diff --git a/openff/interchange/components/_packmol.py b/openff/interchange/components/_packmol.py
index 578b6594..0fea45ad 100644
--- a/openff/interchange/components/_packmol.py
+++ b/openff/interchange/components/_packmol.py
@@ -707,15 +707,21 @@ def pack_box(
         if not packmol_succeeded:
             raise PACKMOLRuntimeError(result)

-        # Load the coordinates from the PDB file with RDKit (because its already
-        # a dependency)
-        rdmol = rdkit.Chem.rdmolfiles.MolFromPDBFile(
-            output_file_path,
-            sanitize=False,
-            removeHs=False,
-            proximityBonding=False,
-        )
-        positions = rdmol.GetConformers()[0].GetPositions()
+        try:
+            # Load the coordinates from the PDB file with RDKit (because its already
+            # a dependency)
+            rdmol = rdkit.Chem.rdmolfiles.MolFromPDBFile(
+                output_file_path,
+                sanitize=False,
+                removeHs=False,
+                proximityBonding=False,
+            )
+
+            positions = rdmol.GetConformers()[0].GetPositions()
+        except AttributeError:
+            import mdtraj
+
+            positions = mdtraj.load(output_file_path).xyz[0] * 0.1

     # TODO: This currently does not run if we encountered an error in the
     # context manager
taidbui commented 1 year ago

Great, thanks. This works.

mattwthompson commented 1 year ago

Thanks, this should land in the next release (probably 0.3.14).