openforcefield / openff-qcsubmit

Automated tools for submitting molecules to QCFractal
https://openff-qcsubmit.readthedocs.io/en/latest/index.html
MIT License
26 stars 4 forks source link

Dihedral constraints for non-driven dihedrals in TorsionDrives #167

Closed chapincavender closed 2 years ago

chapincavender commented 2 years ago

Dihedral constraints are currently supported for Optimization Datasets but not for TorsionDrive Datasets. For flexible molecules with many coupled torsional degrees of freedom, it is useful to constrain non-driven dihedrals that are coupled to driven dihedrals during TorsionDrives. For example, in dipeptides, the backbone dihedrals phi and psi are coupled to each other and also to sidechain dihedrals.

Below is a mol2 file for tryptophan dipeptide from the OpenFF port of Amber ff14SB, AllDipeptides/MainChain/TRP/TRP.mol2 from AllDipeptides.tar.gz.

@<TRIPOS>MOLECULE
default_name
   36    37     1     0     0
SMALL
No Charge or Current Charge

@<TRIPOS>ATOM
      1 H1          25.9870    25.3430    25.0680 H          1 ACE       0.112300
      2 CH3         26.0370    26.1350    24.3220 C.3        1 ACE      -0.366200
      3 H2          25.1870    26.8040    24.4430 H          1 ACE       0.112300
      4 H3          26.0280    25.6980    23.3250 H          1 ACE       0.112300
      5 C           27.3180    26.9120    24.5180 C.2        1 ACE       0.597200
      6 O           28.0910    26.6070    25.4170 O.2        1 ACE      -0.567900
      7 N           27.5380    27.9200    23.6770 N.am       2 TRP      -0.415700
      8 H           26.8400    28.1350    22.9800 H          2 TRP       0.271900
      9 CA          28.7120    28.7970    23.7200 C.3        2 TRP      -0.027500
     10 HA          29.0220    28.9150    24.7590 H          2 TRP       0.112300
     11 CB          29.8680    28.1440    22.9460 C.3        2 TRP      -0.005000
     12 HB2         30.1300    27.2030    23.4330 H          2 TRP       0.033900
     13 HB3         29.5220    27.9090    21.9380 H          2 TRP       0.033900
     14 CG          31.1120    28.9730    22.8400 C.2        2 TRP      -0.141500
     15 CD1         31.6210    29.4850    21.6970 C.2        2 TRP      -0.163800
     16 HD1         31.1940    29.3380    20.7110 H          2 TRP       0.206200
     17 NE1         32.7460    30.2330    21.9840 N.pl3      2 TRP      -0.341800
     18 HE1         33.2940    30.7070    21.2810 H          2 TRP       0.341200
     19 CE2         33.0100    30.2540    23.3370 C.ar       2 TRP       0.138000
     20 CZ2         33.9970    30.8810    24.1110 C.ar       2 TRP      -0.260100
     21 HZ2         34.7550    31.4910    23.6420 H          2 TRP       0.157200
     22 CH2         33.9870    30.7030    25.5050 C.ar       2 TRP      -0.113400
     23 HH2         34.7410    31.1780    26.1180 H          2 TRP       0.141700
     24 CZ3         32.9980    29.9000    26.1020 C.ar       2 TRP      -0.197200
     25 HZ3         32.9990    29.7590    27.1740 H          2 TRP       0.144700
     26 CE3         32.0120    29.2730    25.3140 C.ar       2 TRP      -0.238700
     27 HE3         31.2660    28.6480    25.7840 H          2 TRP       0.170000
     28 CD2         31.9910    29.4370    23.9120 C.ar       2 TRP       0.124300
     29 C           28.3810    30.1950    23.1750 C.2        2 TRP       0.597300
     30 O           27.4560    30.3460    22.3780 O.2        2 TRP      -0.567900
     31 N           29.1370    31.2150    23.6030 N.am       3 NME      -0.415700
     32 H           29.9110    30.9990    24.2150 H          3 NME       0.271900
     33 CH3         28.9560    32.6030    23.1870 C.3        3 NME      -0.149000
     34 HH31        27.9150    32.8980    23.3320 H          3 NME       0.097600
     35 HH32        29.6020    33.2580    23.7730 H          3 NME       0.097600
     36 HH33        29.2040    32.7030    22.1290 H          3 NME       0.097600
@<TRIPOS>BOND
     1     5     6 2   
     2     5     7 am  
     3     2     3 1   
     4     2     4 1   
     5     2     5 1   
     6     1     2 1   
     7    29    30 2   
     8    29    31 am  
     9    26    27 1   
    10    26    28 ar  
    11    24    25 1   
    12    24    26 ar  
    13    22    23 1   
    14    22    24 ar  
    15    20    21 1   
    16    20    22 ar  
    17    19    20 ar  
    18    19    28 ar  
    19    17    18 1   
    20    17    19 1   
    21    15    16 1   
    22    15    17 1   
    23    14    15 2   
    24    14    28 1   
    25    11    12 1   
    26    11    13 1   
    27    11    14 1   
    28     9    10 1   
    29     9    11 1   
    30     9    29 1   
    31     7     8 1   
    32     7     9 1   
    33    33    34 1   
    34    33    35 1   
    35    33    36 1   
    36    31    32 1   
    37    31    33 1   
@<TRIPOS>SUBSTRUCTURE
     1 ACE         1 TEMP              0 ****  ****    0 ROOT

The goal is to create a 2-D TorsionDrive dataset driving phi and psi while constraining the sidechain dihedrals chi1 and chi2. The SMARTS strings and dihedral indices for these dihedrals are described below.

from openff.toolkit.topology import Molecule

# The fourth atom in chi2 is not hydrogen and not bonded to 3 ring atoms to select CA-CB-CG-CD1 for TRP
dihedral_smarts = {
    'phi': '[#6X4]-[#6X3:1](=O)-[#7X3:2]-[#6X4:3]-[#6X3:4](=O)-[#7X3]-[#6X4]',
    'psi': '[#6X4]-[#6X3](=O)-[#7X3:1]-[#6X4:2]-[#6X3:3](=O)-[#7X3:4]-[#6X4]',
    'chi1': '[#6X4]-[#6X3](=O)-[#7X3:1]-[#6X4:2](-[#6X3](=O)-[#7X3]-[#6X4])-[#6X4:3]-[!#1:4]',
    'chi2': '[#6X4]-[#6X3](=O)-[#7X3H1]-[#6X4:1](-[#6X3](=O)-[#7X3]-[#6X4])-[#6X4:2]-[#6:3]~[!#1!R3:4]'
}

trp_mol = Molecule.from_file(os.path.join('AllDipeptides', 'MainChain', 'TRP', 'TRP.mol2'))
for dihedral, smarts in dihedral_smarts.items():
    print(dihedral)
    for i, indices in enumerate(trp_mol.chemical_environment_matches(smarts)):
        print(i, indices)
phi 0 (4, 6, 8, 28)
psi 0 (6, 8, 28, 30)
chi1 0 (6, 8, 10, 13)
chi2 0 (8, 10, 13, 14)

Edited to fix variable name in python code

jthorton commented 2 years ago

@chapincavender thanks for writing this up and the example.

I have had a look at implementing this and while straightforward for QCSubmit this is complicated by QCFractal as it does not currently accept the constraint keywords which need to be passed to geometric.

For an optimization dataset, we can pass the constraints as keywords here which are combined with the general geometric optimisation settings, for torsiondrives however the keyword schema is strict and does not provide anywhere for us to pass the constraints this would need to be changed first before we are able to do this. The logic in the torsiondrive service may also need to be updated to ensure the constraints are correctly passed on to each spawned optimisation. cc @dotsdl @bennybp

chapincavender commented 2 years ago

Thanks @jthorton. So the conclusion is that we need to get MolSSI involved to implement and test this, and that might require a few weeks of time to coordinate that effort.

dotsdl commented 2 years ago

Thanks @jthorton for identifying a clear solution to this issue. I have committed to implementation. Aiming for resolution quickly.

jthorton commented 2 years ago

@chapincavender I have also extended the torsiondrive datasets entries to have the constraint adding API you can see an example of how to do this in the test here. Let me know if you need anything else for this.