Open j-wags opened 4 years ago
I've added a reproducing example to the repo. My first attempt was to try converting the PRO.mol2
file to SDF, to see if openbabel could fix it:
(off-dev) jeffreywagner@JW-MBP$ cat reproduce.py
from openforcefield.topology import Molecule
import os
os.system('obabel -imol2 /Users/jeffreywagner/projects/OpenForceField/openforcefield/examples/temp/2020-05-21-amber-ff-porting/tests/issue_2_c_term_charge/CTerminal/PRO/PRO.mol2 -osdf -O /Users/jeffreywagner/projects/OpenForceField/openforcefield/examples/temp/2020-05-21-amber-ff-porting/tests/issue_2_c_term_charge/CTerminal/PRO/PRO.sdf')
mol = Molecule.from_file('/Users/jeffreywagner/projects/OpenForceField/openforcefield/examples/temp/2020-05-21-amber-ff-porting/tests/issue_2_c_term_charge/CTerminal/PRO/PRO.sdf')
print('sdf smiles', mol.to_smiles())
mol = Molecule.from_file('/Users/jeffreywagner/projects/OpenForceField/openforcefield/examples/temp/2020-05-21-amber-ff-porting/tests/issue_2_c_term_charge/CTerminal/PRO/PRO.mol2')
print('mol2 smiles', mol.to_smiles())
(off-dev) jeffreywagner@JW-MBP$ python reproduce.py
1 molecule converted
sdf smiles [H][C@]1(C(C(C(N1C(=O)C([H])([H])[H])([H])[H])([H])[H])([H])[H])[C]([O-])[O-]
mol2 smiles [H][C@]1(C(C(C(N1C(=O)C([H])([H])[H])([H])[H])([H])[H])([H])[H])[C-]([O-])[O-]
Somehow, converting to SDF removes the negative charge from the carbon, without affecting the oxygens. This should be an impossible kekule structure, which is confusing.
Looking closely at the PRO.mol2
file, I'd expect the C(O)(O)
motif to either have one double bond, or two aromatic bonds. But here, it just has two single bonds:
(off-dev) jeffreywagner@JW-MBP$ cat CTerminal/PRO/PRO.mol2
@<TRIPOS>MOLECULE
default_name
21 21 1 0 0
SMALL
No Charge or Current Charge
@<TRIPOS>ATOM
1 H1 26.0920 24.8880 24.0010 H 1 ACE 0.112300
...
18 HA 29.4170 28.2960 23.1440 H 2 CPR -0.567900
19 C 29.5880 28.3690 25.2780 C.2 2 CPR -0.567900
20 O 28.9360 28.2350 26.3400 O.co2 2 CPR -0.567900
21 OXT 30.8350 28.4020 25.1910 O.co2 2 CPR -0.567900
@<TRIPOS>BOND
...
19 8 11 1
20 7 8 1
21 7 17 1
@<TRIPOS>SUBSTRUCTURE
1 ACE 1 TEMP 0 **** **** 0 ROOT
If I change the last few bonds to type am
, I get the following.
(off-dev) jeffreywagner@JW-MBP$ cat CTerminal/PRO/PRO.mol2
...
19 8 11 1
20 7 8 am
21 7 17 am
@<TRIPOS>SUBSTRUCTURE
...
(off-dev) jeffreywagner@JW-MBP$ python reproduce.py 1 molecule converted sdf smiles [H][C@]1(C(C(C(N1C(=O)C([H])([H])[H])([H])[H])([H])[H])([H])[H])C[O-] mol2 smiles [H][C@]1(C(C(C(N1C(=O)C([H])([H])[H])([H])[H])([H])[H])([H])[H])C-[O-]
which is exactly the same as before. ~The same happens if I change them to order 1
and 2
explicitly.~ (Note from the future: This is incorrect, and setting either of the bonds to order 2 DOES end up solving this issue)
I've also tried changing the OE reading function to both MOL2
and MOL2H
modes explicitly, with no luck
This seems to be a consistent problem with carboxylate groups. For example, loading MainChain/ALA_ASP/ALA_ASP.mol2
[1] (with capped termini) yields the same -3 charge:
from openeye import oechem
ifs = oechem.oemolistream('/XXX/MainChain/ALA_ASP/ALA_ASP.mol2')
ifs.SetFormat(oechem.OEFormat_MOL2)
mols = list()
oemol = oechem.OEMol()
while oechem.OEReadMolecule(ifs, oemol):
oechem.OEPerceiveChiral(oemol)
oechem.OEAssignAromaticFlags(oemol, oechem.OEAroModel_MDL)
smiles_options = oechem.OESMILESFlag_Canonical | oechem.OESMILESFlag_Isotopes | oechem.OESMILESFlag_RGroups
smiles_options |= oechem.OESMILESFlag_AtomStereo | oechem.OESMILESFlag_BondStereo
smiles_options |= oechem.OESMILESFlag_Hydrogens
oechem.OE3DToInternalStereo(oemol)
print(oechem.OECreateSmiString(oemol, smiles_options))
[H][C@@](C(=O)N([H])[C@]([H])(C(=O)N([H])C([H])([H])[H])C([H])([H])[C-]([O-])[O-])(C([H])([H])[H])N([H])C(=O)C([H])([H])[H]
[1]
(off-dev) jeffreywagner@JW-MBP$ cat MainChain/ALA_ASP/ALA_ASP.mol2
@<TRIPOS>MOLECULE
default_name
34 33 1 0 0
SMALL
No Charge or Current Charge
@<TRIPOS>ATOM
1 H1 25.8770 25.2200 24.4630 H 1 ACE 0.112300
2 CH3 25.9500 26.2290 24.0600 C.3 1 ACE -0.366200
3 H2 25.3070 26.8990 24.6280 H 1 ACE 0.112300
4 H3 25.6560 26.2300 23.0130 H 1 ACE 0.112300
5 C 27.3820 26.6990 24.1750 C.2 1 ACE 0.597200
6 O 28.2290 25.9620 24.6620 O.2 1 ACE -0.567900
7 N 27.6380 27.9240 23.7250 N.am 2 ALA -0.415700
8 H 26.8750 28.4830 23.3720 H 2 ALA 0.271900
9 CA 28.9320 28.6040 23.7970 C.3 2 ALA 0.033700
10 HA 29.4200 28.3430 24.7380 H 2 ALA 0.082300
11 CB 29.8160 28.1330 22.6330 C.3 2 ALA -0.182500
12 HB1 29.3430 28.3840 21.6820 H 2 ALA 0.060300
13 HB2 30.7890 28.6220 22.6870 H 2 ALA 0.060300
14 HB3 29.9610 27.0530 22.6900 H 2 ALA 0.060300
15 C 28.7340 30.1320 23.7680 C.2 2 ALA 0.597300
16 O 27.6840 30.6090 23.3400 O.2 2 ALA -0.567900
17 N 29.7520 30.8790 24.1980 N.am 3 ASP -0.516300
18 H 30.6100 30.4250 24.4750 H 3 ASP 0.293600
19 CA 29.7940 32.3470 24.2250 C.3 3 ASP 0.038100
20 HA 29.2090 32.7390 23.3910 H 3 ASP 0.088000
21 CB 29.1610 32.8520 25.5390 C.3 3 ASP -0.030300
22 HB2 28.2090 32.3390 25.6890 H 3 ASP -0.012200
23 HB3 29.8170 32.5870 26.3710 H 3 ASP -0.012200
24 CG 28.8850 34.3630 25.5750 C.2 3 ASP 0.799400
25 OD1 28.9520 35.0080 24.5030 O.co2 3 ASP -0.801400
26 OD2 28.6260 34.8650 26.6890 O.co2 3 ASP -0.801400
27 C 31.2540 32.8290 24.0580 C.2 3 ASP 0.536600
28 O 32.1890 32.0250 24.1430 O.2 3 ASP -0.581900
29 N 31.4590 34.1240 23.8040 N.am 4 NME -0.415700
30 H 30.6300 34.7210 23.8310 H 4 NME 0.271900
31 CH3 32.7660 34.7490 23.6330 C.3 4 NME -0.149000
32 HH31 33.3710 34.1520 22.9490 H 4 NME 0.097600
33 HH32 32.6430 35.7550 23.2310 H 4 NME 0.097600
34 HH33 33.2690 34.8040 24.6000 H 4 NME 0.097600
@<TRIPOS>BOND
1 5 6 2
2 5 7 am
3 2 3 1
4 2 4 1
5 2 5 1
6 1 2 1
7 15 16 2
8 15 17 am
9 11 12 1
10 11 13 1
11 11 14 1
12 9 10 1
13 9 11 1
14 9 15 1
15 7 8 1
16 7 9 1
17 27 28 2
18 27 29 am
19 24 25 1
20 24 26 1
21 21 22 1
22 21 23 1
23 21 24 1
24 19 20 1
25 19 21 1
26 19 27 1
27 17 18 1
28 17 19 1
29 31 32 1
30 31 33 1
31 31 34 1
32 29 30 1
33 29 31 1
@<TRIPOS>SUBSTRUCTURE
1 ACE 1 TEMP 0 **** **** 0 ROOT
Ah, I've retraced my steps and found that my previous attempt to change bond orders was incorrect -- I didn't modify the correct bonds (instead, it looks like I made some random C-H bonds aromatic).
So, taking a single proline with an uncapped C terminal [1], I can change the bond entry section from:
6 1 2 1
7 19 20 1
8 19 21 1
9 17 18 1
to
6 1 2 1
7 19 20 2
8 19 21 1
9 17 18 1
which yields the correct behavior:
(off-dev) jeffreywagner@JW-MBP$ python reproduce.py
1 molecule converted
sdf smiles [H][C@]1(C(C(C(N1C(=O)C([H])([H])[H])([H])[H])([H])[H])([H])[H])C(=O)[O]
mol2 smiles [H][C@]1(C(C(C(N1C(=O)C([H])([H])[H])([H])[H])([H])[H])([H])[H])C(=O)[O-]
(Now the sdf smiles gets mangled somehow, but that could be an OpenBabel issue, so I'm not going to focus on it here)
[1]
(off-dev) jeffreywagner@JW-MBP$ cat CTerminal/PRO/PRO.mol2
@<TRIPOS>MOLECULE
default_name
21 21 1 0 0
SMALL
No Charge or Current Charge
@<TRIPOS>ATOM
1 H1 26.0920 24.8880 24.0010 H 1 ACE 0.112300
2 CH3 25.9970 25.9720 24.0070 C.3 1 ACE -0.366200
3 H2 25.6220 26.2960 24.9760 H 1 ACE 0.112300
4 H3 25.3290 26.2900 23.2110 H 1 ACE 0.112300
5 C 27.3720 26.5810 23.7870 C.2 1 ACE 0.597200
6 O 28.2780 25.8270 23.4420 O.2 1 ACE -0.567900
7 N 27.5140 27.9110 23.9390 N.am 2 CPR -0.567900
8 CD 26.4410 28.8820 24.0610 C.3 2 CPR -0.567900
9 HD2 26.2330 29.0550 25.1180 H 2 CPR -0.567900
10 HD3 25.5380 28.5690 23.5390 H 2 CPR -0.567900
11 CG 26.9990 30.1490 23.4240 C.3 2 CPR -0.567900
12 HG2 26.4950 31.0430 23.7910 H 2 CPR -0.567900
13 HG3 26.9210 30.0770 22.3380 H 2 CPR -0.567900
14 CB 28.4690 30.1130 23.8390 C.3 2 CPR -0.567900
15 HB2 28.5800 30.6140 24.8030 H 2 CPR -0.567900
16 HB3 29.1070 30.5910 23.0950 H 2 CPR -0.567900
17 CA 28.8000 28.6180 23.9820 C.3 2 CPR -0.567900
18 HA 29.4170 28.2960 23.1440 H 2 CPR -0.567900
19 C 29.5880 28.3690 25.2780 C.2 2 CPR -0.567900
20 O 28.9360 28.2350 26.3400 O.co2 2 CPR -0.567900
21 OXT 30.8350 28.4020 25.1910 O.co2 2 CPR -0.567900
@<TRIPOS>BOND
1 5 6 2
2 5 7 am
3 2 3 1
4 2 4 1
5 2 5 1
6 1 2 1
7 19 20 2
8 19 21 1
9 17 18 1
10 17 19 1
11 14 15 1
12 14 16 1
13 14 17 1
14 11 12 1
15 11 13 1
16 11 14 1
17 8 9 1
18 8 10 1
19 8 11 1
20 7 8 1
21 7 17 1
@<TRIPOS>SUBSTRUCTURE
1 ACE 1 TEMP 0 **** **** 0 ROOT
So at this point, I don't think that the tleap-outputted mol2 file is spec-compliant (since it makes sp2 carbons with three single bonds). Now the question is "how do we get AmberTools to output correct bond orders?".
I see a little history on the AMBER mailing list for this question
bondtype -i test.mol2 -j full -f mol2 -o bondtypes
(where test.mol2 comes directly from tleap) makes a file called bondtypes
which is kinda PDB-like, and identifies the carboxylate bonds as order "9", which probably means aromatic. I wish this would make a fixed mol2 file, but it doesn't.ar
(not am
/amide, like I initially tried because I'm a dummy). This leads to the correct behavior. SaveMol2
function, which is whether to write sybyl vs. amber atom types. The TRIPOS mol2 specification document can be found here: http://chemyang.ccnu.edu.cn/ccb/server/AIMMS/mol2.pdf
I'm going to reopen this issue, but remove the blocking tag. It would be good to push this upstream into the AMBER stack.
Yes, this has been an issue for eons.
I think nitros might have a problem too? Vague recollection.
I've started a thread on the Amber Developers' listserv, we'll see what happens.
Dave
On Tue, May 26, 2020 at 8:14 PM David L. Mobley notifications@github.com wrote:
Yes, this has been an issue for eons.
I think nitros might have a problem too? Vague recollection.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openforcefield/amber-ff-porting/issues/2#issuecomment-634346905, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEDWD67TUD5W6GUDTCLT6H3RTRLOBANCNFSM4NHHRFNA .
Good to know. Thanks!
For now I'm going to leave the bond order fix in this repo, but if the tleap --> mol2 --> OFFTK pathway starts getting more traffic, then we can add the fix into OFFTK (we already check for AMBER atom types when reading mol2, so this wouldn't be unprecedented).
During SMARTS creation, the C termini seem to have a bond perception problem, yielding things like
[H][C@:3]1([C:2]([C:1](C(N1)([H])[H])([H])[H])([H])[H])[C-]([O-])[O-]
This Issue is to figure out why the final three atoms all have negative charges, while an uncapped C terminal should only have one