openforcefield / openff-toolkit

The Open Forcefield Toolkit provides implementations of the SMIRNOFF format, parameterization engine, and other tools. Documentation available at http://open-forcefield-toolkit.readthedocs.io
http://openforcefield.org
MIT License
309 stars 90 forks source link

UnassignedChemistryInPDBError: Can't make topology from a small molecule that isn't ethanol or cyclohexane. #1682

Open joelaforet opened 1 year ago

joelaforet commented 1 year ago

Hello!

I would like to parametrize a system of small molecules that I constructed using PackMol. I generated my Molecule objects from SMILES according to the tutorial. I then used PackMol to generate a pdb file of many different molecules arranged in a box (attached as a .txt github won't let me upload the pdb). When I try to generate the topology from the pdb file, I get the following error:

To Reproduce Python Code to reproduce:

pdb_file_path = 'box_Sorafenib_CholicAcid_100_5.pdb'
sorafenib = Molecule.from_smiles("CNC(=O)C1=NC=CC(=C1)OC2=CC=C(C=C2)NC(=O)NC3=CC(=C(C=C3)Cl)C(F)(F)F")
cholic_acid = Molecule.from_smiles("C[C@H](CCC(=O)O)[C@H]1CC[C@@H]2[C@@]1([C@H](C[C@H]3[C@H]2[C@@H](C[C@H]4[C@@]3(CC[C@H](C4)O)C)O)O)C")

from openff.toolkit import ForceField, Topology

topology = Topology.from_pdb(
    pdb_file_path,
    unique_molecules=[sorafenib, cholic_acid],
)
# *ERROR STARTS HERE*
forcefield = ForceField("openff-2.1.0.offxml")
interchange = forcefield.create_interchange(topology)

If the problem involves a specific molecule or file, please upload that as well. [-->] box_Sorafenib_CholicAcid_100_5.txt

Output


UnassignedChemistryInPDBError             Traceback (most recent call last)
Cell In[84], line 4
      1 from openff.toolkit import ForceField, Topology
      3 # Create the OpenFF Topology from an PDB file
----> 4 topology = Topology.from_pdb(
      5     pdb_file_path,
      6     unique_molecules=[sorafenib, cholic_acid],
      7 )
      9 # Load the OpenFF "Sage" force field.
     10 forcefield = ForceField("openff-2.1.0.offxml")

File ~/miniconda3/envs/simulations/lib/python3.10/site-packages/openff/utilities/utilities.py:80, in requires_package.<locals>.inner_decorator.<locals>.wrapper(*args, **kwargs)
     77 except Exception as e:
     78     raise e
---> 80 return function(*args, **kwargs)

File ~/miniconda3/envs/simulations/lib/python3.10/site-packages/openff/toolkit/topology/topology.py:1758, in Topology.from_pdb(cls, file_path, unique_molecules, toolkit_registry, _custom_substructures, _additional_substructures)
   1752 substructure_dictionary["ADDITIONAL_SUBSTRUCTURE_OVERLAP"] = {}
   1754 coords_angstrom = np.array(
   1755     [[*vec3.value_in_unit(openmm_unit.angstrom)] for vec3 in pdb.getPositions()]
   1756 )
-> 1758 topology = toolkit_registry.call(
   1759     "_polymer_openmm_pdbfile_to_offtop",
   1760     cls,
   1761     pdb,
   1762     substructure_dictionary,
   1763     coords_angstrom,
   1764     _custom_substructures,
   1765 )
   1767 for off_atom, atom in zip([*topology.atoms], pdb.topology.atoms()):
   1768     off_atom.metadata["residue_name"] = atom.residue.name

File ~/miniconda3/envs/simulations/lib/python3.10/site-packages/openff/toolkit/utils/toolkit_registry.py:356, in ToolkitRegistry.call(self, method_name, raise_exception_types, *args, **kwargs)
    354             for exception_type in raise_exception_types:
    355                 if isinstance(e, exception_type):
--> 356                     raise e
    357             errors.append((toolkit, e))
    359 # No toolkit was found to provide the requested capability
    360 # TODO: Can we help developers by providing a check for typos in expected method names?

File ~/miniconda3/envs/simulations/lib/python3.10/site-packages/openff/toolkit/utils/toolkit_registry.py:352, in ToolkitRegistry.call(self, method_name, raise_exception_types, *args, **kwargs)
    350 method = getattr(toolkit, method_name)
    351 try:
--> 352     return method(*args, **kwargs)
    353 except Exception as e:
    354     for exception_type in raise_exception_types:

File ~/miniconda3/envs/simulations/lib/python3.10/site-packages/openff/toolkit/utils/rdkit_wrapper.py:321, in RDKitToolkitWrapper._polymer_openmm_pdbfile_to_offtop(self, topology_class, pdbfile, substructure_dictionary, coords_angstrom, _custom_substructures)
    314 custom_substructure_dictionary = self._prepare_custom_substructures(
    315     _custom_substructures
    316 )
    317 substructure_dictionary.update(
    318     custom_substructure_dictionary
    319 )  # concats both dicts, unique keys are enforced in previous function
--> 321 rdkit_mol = self._polymer_openmm_topology_to_rdmol(
    322     omm_top, substructure_dictionary
    323 )
    325 rdmol_conformer = Chem.Conformer()
    326 for atom_idx in range(rdkit_mol.GetNumAtoms()):

File ~/miniconda3/envs/simulations/lib/python3.10/site-packages/openff/toolkit/utils/rdkit_wrapper.py:818, in RDKitToolkitWrapper._polymer_openmm_topology_to_rdmol(self, omm_top, substructure_library)
    807             symbols = sorted(
    808                 [
    809                     SYMBOLS[atom.GetAtomicNum()]
   (...)
    812                 ]
    813             )
    814             resname_to_symbols_and_atomnames[resname].append(
    815                 (symbols, atom_names)
    816             )
--> 818     raise UnassignedChemistryInPDBError(
    819         substructure_library=resname_to_symbols_and_atomnames,
    820         omm_top=omm_top,
    821         unassigned_atoms=unassigned_atoms,
    822         unassigned_bonds=unassigned_bonds,
    823         matches=matches,
    824     )
    826 # set some properties to later remember what matches were made
    827 for atom in mol.GetAtoms():

UnassignedChemistryInPDBError: Some bonds or atoms in the input could not be identified.

Hint: The following residue names with unassigned atoms were not found in the substructure library. While the OpenFF Toolkit identifies residues by matching chemical substructures rather than by residue name, it currently only supports the 20 'canonical' amino acids.
    ZDC
    ZSO

Hint: The following residues were assigned names that do not match the residue name in the input, or could not be assigned residue names at all. This may indicate that atoms are missing from the input or some other error. The OpenFF Toolkit requires all atoms, including hydrogens, to be explicit in the input to avoid ambiguities in protonation state or bond order:
    Input residue A:ZDC#0001 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0002 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0003 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0004 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0005 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0006 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0007 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0008 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0009 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0010 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0011 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0012 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0013 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0014 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0015 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0016 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0017 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0018 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0019 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0020 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0021 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0022 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0023 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0024 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0025 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0026 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0027 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0028 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0029 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0030 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0031 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0032 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0033 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0034 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0035 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0036 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0037 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0038 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0039 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0040 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0041 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0042 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0043 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0044 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0045 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0046 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0047 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0048 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0049 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0050 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0051 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0052 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0053 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0054 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0055 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0056 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0057 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0058 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0059 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0060 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0061 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0062 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0063 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0064 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0065 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0066 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0067 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0068 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0069 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0070 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0071 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0072 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0073 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0074 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0075 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0076 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0077 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0078 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0079 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0080 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0081 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0082 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0083 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0084 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0085 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0086 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0087 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0088 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0089 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0090 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0091 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0092 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0093 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0094 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0095 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0096 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0097 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0098 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0099 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue A:ZDC#0100 contains atoms matching substructures {'No match', 'PEPTIDE_BOND', 'NME'}
    Input residue B:ZSO#0001 contains atoms matching substructures {'No match'}
    Input residue B:ZSO#0002 contains atoms matching substructures {'No match'}
    Input residue B:ZSO#0003 contains atoms matching substructures {'No match'}
    Input residue B:ZSO#0004 contains atoms matching substructures {'No match'}
    Input residue B:ZSO#0005 contains atoms matching substructures {'No match'}

Error: The following 4345 atoms exist in the input but could not be assigned chemical information from the substructure library:
    Atom     1 (C20) in residue A:ZDC#0001
    Atom     3 (C10) in residue A:ZDC#0001
    Atom     4 (C11) in residue A:ZDC#0001
    Atom     5 (C12) in residue A:ZDC#0001
    Atom     6 (C13) in residue A:ZDC#0001
    Atom     7 (C14) in residue A:ZDC#0001
    Atom     8 (C15) in residue A:ZDC#0001
    Atom     9 (C16) in residue A:ZDC#0001
    Atom    10 (C17) in residue A:ZDC#0001
    Atom    11 (C18) in residue A:ZDC#0001
    Atom    12 (C19) in residue A:ZDC#0001
    Atom    13 (C2) in residue A:ZDC#0001
    Atom    14 (C3) in residue A:ZDC#0001
    Atom    15 (C4) in residue A:ZDC#0001
    Atom    16 (C5) in residue A:ZDC#0001
    Atom    17 (C6) in residue A:ZDC#0001
    Atom    18 (C7) in residue A:ZDC#0001
    Atom    19 (C8) in residue A:ZDC#0001
    Atom    20 (C9) in residue A:ZDC#0001
    Atom    21 (Cl) in residue A:ZDC#0001
    Atom    22 (F) in residue A:ZDC#0001
    Atom    23 (F1) in residue A:ZDC#0001
    Atom    24 (F2) in residue A:ZDC#0001
    Atom    25 (H10) in residue A:ZDC#0001
    Atom    26 (H11) in residue A:ZDC#0001
    Atom    27 (H12) in residue A:ZDC#0001
    Atom    28 (H13) in residue A:ZDC#0001
    Atom    29 (H14) in residue A:ZDC#0001
    Atom    30 (H15) in residue A:ZDC#0001
    Atom    32 (H5) in residue A:ZDC#0001
    Atom    33 (H6) in residue A:ZDC#0001
    Atom    34 (H7) in residue A:ZDC#0001
    Atom    35 (H8) in residue A:ZDC#0001
    Atom    36 (H9) in residue A:ZDC#0001
    Atom    38 (N1) in residue A:ZDC#0001
    Atom    39 (N2) in residue A:ZDC#0001
    Atom    40 (N3) in residue A:ZDC#0001
    Atom    42 (O1) in residue A:ZDC#0001
    Atom    43 (O2) in residue A:ZDC#0001
    Atom    47 (H4) in residue A:ZDC#0001
    Atom    49 (C20) in residue A:ZDC#0002
    Atom    51 (C10) in residue A:ZDC#0002
    Atom    52 (C11) in residue A:ZDC#0002
    Atom    53 (C12) in residue A:ZDC#0002
    Atom    54 (C13) in residue A:ZDC#0002
    Atom    55 (C14) in residue A:ZDC#0002
    Atom    56 (C15) in residue A:ZDC#0002
    Atom    57 (C16) in residue A:ZDC#0002
    Atom    58 (C17) in residue A:ZDC#0002
    Atom    59 (C18) in residue A:ZDC#0002
    Atom    60 (C19) in residue A:ZDC#0002
    Atom    61 (C2) in residue A:ZDC#0002
    Atom    62 (C3) in residue A:ZDC#0002
    Atom    63 (C4) in residue A:ZDC#0002
    Atom    64 (C5) in residue A:ZDC#0002
    Atom    65 (C6) in residue A:ZDC#0002
    Atom    66 (C7) in residue A:ZDC#0002
    Atom    67 (C8) in residue A:ZDC#0002
    Atom    68 (C9) in residue A:ZDC#0002
    Atom    69 (Cl) in residue A:ZDC#0002
    Atom    70 (F) in residue A:ZDC#0002
    Atom    71 (F1) in residue A:ZDC#0002
    Atom    72 (F2) in residue A:ZDC#0002
    Atom    73 (H10) in residue A:ZDC#0002
    Atom    74 (H11) in residue A:ZDC#0002
    Atom    75 (H12) in residue A:ZDC#0002
    Atom    76 (H13) in residue A:ZDC#0002
    Atom    77 (H14) in residue A:ZDC#0002
    Atom    78 (H15) in residue A:ZDC#0002
    Atom    80 (H5) in residue A:ZDC#0002
    Atom    81 (H6) in residue A:ZDC#0002
    Atom    82 (H7) in residue A:ZDC#0002
    Atom    83 (H8) in residue A:ZDC#0002
    Atom    84 (H9) in residue A:ZDC#0002
    Atom    86 (N1) in residue A:ZDC#0002
    Atom    87 (N2) in residue A:ZDC#0002
    Atom    88 (N3) in residue A:ZDC#0002
    Atom    90 (O1) in residue A:ZDC#0002
    Atom    91 (O2) in residue A:ZDC#0002
    Atom    95 (H4) in residue A:ZDC#0002
    Atom    97 (C20) in residue A:ZDC#0003
    Atom    99 (C10) in residue A:ZDC#0003
    Atom   100 (C11) in residue A:ZDC#0003
    Atom   101 (C12) in residue A:ZDC#0003
    Atom   102 (C13) in residue A:ZDC#0003
    Atom   103 (C14) in residue A:ZDC#0003
    Atom   104 (C15) in residue A:ZDC#0003
    Atom   105 (C16) in residue A:ZDC#0003
    Atom   106 (C17) in residue A:ZDC#0003
    Atom   107 (C18) in residue A:ZDC#0003
    Atom   108 (C19) in residue A:ZDC#0003
    Atom   109 (C2) in residue A:ZDC#0003
    Atom   110 (C3) in residue A:ZDC#0003
    Atom   111 (C4) in residue A:ZDC#0003
    Atom   112 (C5) in residue A:ZDC#0003
    Atom   113 (C6) in residue A:ZDC#0003

.... Repeats for all atoms in system

Computing environment (please complete the following information):

j-wags commented 1 year ago

Hi @joelaforet,

It looks like the element labels in the final column are a little mixed up (we need those to be accurate to match to the chemical graph during loading). The problems are a mix of atom types being substituted in (ca instead of C) and in some of the ZSO molecules, the wrong element being present (some Hs are listed as Cs)

To illustrate the issue, I reduced your box to one molecule of each type and still got the error. Then I went through and manually relabeled the elements based on the atom names, and the box loads successfully. See the following two attached files for reference.

box_fixed_atom_types.txt box_minimal.txt

So in a pinch, you can make a script to relabel the elements after packmol runs, but maybe there's something earlier in your pipeline that's mangling the element column.

Could you let me know if this solves your problem?

j-wags commented 1 year ago

(Also, thanks for the excellent issue report and reproducing example)

joelaforet commented 1 year ago

Hi @j-wags,

Thanks for your reply! I was wondering, what atom types I should be using when trying to work with OpenFF? Is the system smart enough to recognize different formats like GAFF2 or Sybyl? I am still confused on what exactly needs to be in the .pdb file that we feed in to the topology generator.

joelaforet commented 1 year ago

I re-named the atoms in my molecule pdb files according to your adjustments, but now the cell that creates the interchange takes a very long time to run. It appears to be getting caught in the create_interchange step. Is this process supposed to take a long time? I've attached my new files to this message. Thanks! box_Sorafenib_CholicAcid_1_1.txt CholicAcid.txt Sorafenib.txt

j-wags commented 1 year ago

what atom types I should be using when trying to work with OpenFF?

OpenFF doesn't use atom types - An OpenFF Molecule is defined by atoms (with element, formal charge, and stereochemistry) and bonds (with bond order and stereochemistry). Atom types aren't needed, and parameter assignment is performed directly on the chemical graph.

When loading PDB files, OpenFF sticks right to the PDB spec - So the final column should be elements, not types from any scheme. While we hold on to the atom name, residue name, residue number, insertion code, and chain for each atom loaded from PDB, those don't have any effect on the parameter assignment.

j-wags commented 1 year ago

And the create_interchange step took about 4 minutes to run for me - 90% of this time was probably the AM1BCC charge assignment for the molecules, using Antechamber. This will only run once for each unique molecule in the topology, so the runtime should be similar for the larger topology containing many copies of the same two mols.

joelaforet commented 1 year ago

Thanks for the great explanation, that all makes a lot of sense! I am also using PackMol to make my system, and it looks like OpenFF needs the connect records inside the PDB file to run properly. Also, to limit test, I tried generating a topology for a system with 200 molecules in it, but my kernel crashed. Do you have any advice on what may be causing this/ how to remedy it? I've attached the PDB file to this message.

Thank you! box_Sorafenib_CholicAcid_100_5.zip

mattwthompson commented 11 months ago

I can't reproduce a crash with that file


In [3]: sorafenib = Molecule.from_smiles(
   ...:     "CNC(=O)C1=NC=CC(=C1)OC2=CC=C(C=C2)NC(=O)NC3=CC(=C(C=C3)Cl)C(F)(F)F"
   ...: )
   ...: cholic_acid = Molecule.from_smiles(
   ...:     "C[C@H](CCC(=O)O)[C@H]1CC[C@@H]2[C@@]1([C@H](C[C@H]3[C@H]2[C@@H](C[C@H]4[C@@]3(CC[C@H](C4)O)C)O)O)C"
   ...: )
   ...:

In [4]: Topology.from_pdb(
   ...:     "../../Downloads/box_Sorafenib_CholicAcid_100_5.pdb",
   ...:     unique_molecules=[sorafenib, cholic_acid],
   ...: )
Out[4]: <openff.toolkit.topology.topology.Topology at 0x15aae1a50>