openforcefield / amber-ff-porting

Scratch space for porting amber FFs into SMIRNOFF format
1 stars 3 forks source link

Malformed tripeptide mol2s #28

Open j-wags opened 3 years ago

j-wags commented 3 years ago

A few of the tripeptide mol2 files are malformed in some way. The backbone parameter file is winding up with some odd parameters with multiple charged [H+]s, indicating that a few mol2 files aren't being loaded correctly.

...
<Bond smirks="[H+][N:2]([C:1](=O)[C])[N]" length="1.335 * angstrom" k="980.0 * angstrom**-2 * mole**-1 * kilocalorie" id="A14SB-CTerminal_PRO_THR-C_N"></Bond>
<Bond smirks="[H+][H+:1]([C])[C@-:2]([N])[N+]" length="1.335 * angstrom" k="980.0 * angstrom**-2 * mole**-1 * kilocalorie" id="A14SB-NTerminal_HID_MET-C_N"></Bond>
<Bond smirks="[H][C-:1]([H+])[H+:2]([C])[C]" length="1.335 * angstrom" k="980.0 * angstrom**-2 * mole**-1 * kilocalorie" id="A14SB-NTerminal_LEU_MET-C_N"></Bond>
<Bond smirks="[H][C-:2]([H+])[H+:1]([H+])[C]" length="1.335 * angstrom" k="980.0 * angstrom**-2 * mole**-1 * kilocalorie" id="A14SB-NTerminal_MET_TYR-C_N"></Bond>
<Bond smirks="[H][H+:1]([H+])[C-:2]([H+])[C]" length="1.335 * angstrom" k="980.0 * angstrom**-2 * mole**-1 * kilocalorie" id="A14SB-NTerminal_THR_GLU-C_N"></Bond>
...

I suspect that the following structures have the same problem:

I'll show NTerminal/THR_GLU (malformed) and NTerminal/ALA_GLU (correct) for comparison.

Looking at differences between the file text, it seems like atoms in the cap and first residue get shuffled in with each other. I can't tell if the atoms are valid-but-out-of-order, or if the bonding is nonsense as a result.

NTerminal/THR_GLU (malformed)

Loads into pymol like this:

Screen Shot 2020-09-02 at 8 44 42 PM

With the body text below

NTerminal/THR_GLU/THR_GLU.mol2 ``` @MOLECULE default_name 53 36 1 0 0 SMALL No Charge or Current Charge @ATOM 1 N 27.8470 25.4320 24.0360 N.4 1 NTH 0.000000 2 N1 27.8470 25.4320 24.0360 N.1 1 MOL 0.000000 3 H1 28.4890 24.9920 23.3890 H 1 NTH 0.000000 4 H4 28.4890 24.9920 23.3890 H 1 MOL 0.000000 5 H2 26.8950 25.3340 23.7180 H 1 NTH 0.000000 6 H5 26.8950 25.3340 23.7180 H 1 MOL 0.000000 7 H3 27.9510 24.9760 24.9360 H 1 NTH 0.000000 8 H6 27.9510 24.9760 24.9360 H 1 MOL 0.000000 9 CA 28.2080 26.8560 24.2110 C.3 1 NTH 0.000000 10 C1 28.2080 26.8560 24.2110 C.1 1 MOL 0.000000 11 HA 28.1040 27.3830 23.2650 H 1 NTH 0.000000 12 H7 28.1040 27.3830 23.2650 H 1 MOL 0.000000 13 CB 27.3380 27.5440 25.2600 C.1 1 NTH 0.000000 14 C2 27.3380 27.5440 25.2600 C.1 1 MOL 0.000000 15 HB 27.8270 28.4520 25.6090 H 1 NTH 0.000000 16 H8 27.8270 28.4520 25.6090 H 1 MOL 0.000000 17 CG2 25.9660 27.9010 24.6980 C.2 1 NTH 0.000000 18 C3 25.9660 27.9010 24.6980 C.1 1 MOL 0.000000 19 HG21 25.4190 27.0040 24.4070 H 1 NTH 0.000000 20 H9 25.4190 27.0040 24.4070 H 1 MOL 0.000000 21 HG22 25.3910 28.4330 25.4580 H 1 NTH 0.000000 22 H10 25.3910 28.4330 25.4580 H 1 MOL 0.000000 23 HG23 26.0800 28.5560 23.8340 H 1 NTH 0.000000 24 H11 26.0800 28.5560 23.8340 H 1 MOL 0.000000 25 OG1 27.1420 26.6650 26.3380 O.2 1 NTH 0.000000 26 O1 27.1420 26.6650 26.3380 O.2 1 MOL 0.000000 27 HG1 26.9480 27.1920 27.1240 H 1 NTH 0.000000 28 H12 26.9480 27.1920 27.1240 H 1 MOL 0.000000 29 C 29.6610 26.9220 24.6130 C.1 1 NTH 0.000000 30 C4 29.6610 26.9220 24.6130 C.2 1 MOL 0.000000 31 O 30.1690 25.9460 25.1480 O.2 1 NTH 0.000000 32 O2 30.1690 25.9460 25.1480 ANY 1 MOL 0.000000 33 N 30.3470 28.0040 24.2570 N.1 2 GLU -0.516300 34 H 29.8870 28.8340 23.9080 H 2 GLU 0.293600 35 CA 31.8020 28.0030 24.0670 C.1 2 GLU 0.039700 36 HA 32.2800 27.4160 24.8520 H 2 GLU 0.110500 37 CB 32.0620 27.3270 22.7110 C.1 2 GLU 0.056000 38 HB2 31.5160 26.3820 22.7180 H 2 GLU -0.017300 39 HB3 31.6440 27.9420 21.9120 H 2 GLU -0.017300 40 CG 33.5310 27.0160 22.4000 ANY 2 GLU 0.013600 41 HG2 34.0220 27.9190 22.0300 H 2 GLU -0.042500 42 HG3 34.0250 26.7160 23.3280 H 2 GLU -0.042500 43 CD 33.6600 25.8820 21.3690 ANY 2 GLU 0.805400 44 OE1 32.6790 25.6450 20.6240 ANY 2 GLU -0.818800 45 OE2 34.7350 25.2470 21.3650 ANY 2 GLU -0.818800 46 C 32.3570 29.4380 24.1220 ANY 2 GLU 0.536600 47 O 31.5770 30.3910 24.0710 ANY 2 GLU -0.581900 48 N 33.6820 29.5940 24.2430 ANY 3 NME -0.415700 49 H 34.2620 28.7660 24.2270 H 3 NME 0.271900 50 CH3 34.3520 30.8910 24.3010 ANY 3 NME -0.149000 51 HH31 33.7910 31.5740 24.9420 H 3 NME 0.097600 52 HH32 35.3610 30.7770 24.6980 H 3 NME 0.097600 53 HH33 34.4080 31.3200 23.2990 H 3 NME 0.097600 @BOND 1 15 16 1 2 15 17 1 3 13 14 1 4 9 10 1 5 9 11 1 6 9 12 1 7 7 8 1 8 7 9 1 9 7 13 1 10 5 6 1 11 5 7 1 12 5 15 1 13 1 2 1 14 1 3 1 15 1 4 1 16 1 5 1 17 30 31 1 18 30 32 1 19 27 28 1 20 27 29 1 21 24 25 1 22 24 26 1 23 24 27 1 24 21 22 1 25 21 23 1 26 21 24 1 27 19 20 1 28 19 21 1 29 19 30 1 30 17 18 1 31 17 19 1 32 34 35 1 33 34 36 1 34 34 37 1 35 32 33 1 36 32 34 1 @SUBSTRUCTURE 1 NTH 1 TEMP 0 **** **** 0 ROOT ```

NTerminal/ALA_GLU (good)

Screen Shot 2020-09-02 at 8 44 51 PM

With the body text below

NTerminal/ALA_GLU/ALA_GLU.mol2 ``` @MOLECULE default_name 33 32 1 0 0 SMALL No Charge or Current Charge @ATOM 1 N 28.8020 25.1660 23.6420 N.4 1 NAL 0.000000 2 H1 29.5730 25.0820 22.9750 H 1 NAL 0.000000 3 H2 27.9900 24.7490 23.2190 H 1 NAL 0.000000 4 H3 29.1210 24.6710 24.4630 H 1 NAL 0.000000 5 CA 28.6260 26.6110 23.9150 C.3 1 NAL 0.000000 6 HA 28.4830 27.1150 22.9590 H 1 NAL 0.000000 7 CB 27.4210 26.9260 24.8130 C.3 1 NAL 0.000000 8 HB1 27.5140 26.4150 25.7770 H 1 NAL 0.000000 9 HB2 27.3830 28.0050 24.9860 H 1 NAL 0.000000 10 HB3 26.4950 26.6220 24.3160 H 1 NAL 0.000000 11 C 29.9290 27.1360 24.5040 C.2 1 NAL 0.000000 12 O 30.4430 26.5450 25.4480 O.2 1 NAL 0.000000 13 N 30.4860 28.1920 23.9060 N.am 2 GLU -0.516300 14 H 29.9550 28.6980 23.2130 H 2 GLU 0.293600 15 CA 31.9520 28.3680 23.8230 C.3 2 GLU 0.039700 16 HA 32.4350 27.6850 24.5240 H 2 GLU 0.110500 17 CB 32.4350 28.0080 22.3960 C.3 2 GLU 0.056000 18 HB2 31.6980 28.3430 21.6630 H 2 GLU -0.017300 19 HB3 33.3620 28.5430 22.1840 H 2 GLU -0.017300 20 CG 32.7260 26.5070 22.2180 C.3 2 GLU 0.013600 21 HG2 33.1510 26.3470 21.2290 H 2 GLU -0.042500 22 HG3 33.4700 26.2030 22.9580 H 2 GLU -0.042500 23 CD 31.4620 25.6520 22.3090 C.2 2 GLU 0.805400 24 OE1 30.4290 26.0510 21.7200 O.co2 2 GLU -0.818800 25 OE2 31.4040 24.6610 23.0660 O.co2 2 GLU -0.818800 26 C 32.4220 29.7820 24.1980 C.2 2 GLU 0.536600 27 O 31.6460 30.7270 24.2110 O.2 2 GLU -0.581900 28 N 33.7340 29.9150 24.4640 N.am 3 NME -0.415700 29 H 34.3090 29.0890 24.4280 H 3 NME 0.271900 30 CH3 34.3850 31.1990 24.6600 C.3 3 NME -0.149000 31 HH31 33.8010 31.8080 25.3510 H 3 NME 0.097600 32 HH32 35.3920 31.0660 25.0630 H 3 NME 0.097600 33 HH33 34.4480 31.7250 23.7040 H 3 NME 0.097600 @BOND 1 11 12 2 2 11 13 am 3 7 8 1 4 7 9 1 5 7 10 1 6 5 6 1 7 5 7 1 8 5 11 1 9 1 2 1 10 1 3 1 11 1 4 1 12 1 5 1 13 26 27 2 14 26 28 am 15 23 24 1 16 23 25 1 17 20 21 1 18 20 22 1 19 20 23 1 20 17 18 1 21 17 19 1 22 17 20 1 23 15 16 1 24 15 17 1 25 15 26 1 26 13 14 1 27 13 15 1 28 30 31 1 29 30 32 1 30 30 33 1 31 28 29 1 32 28 30 1 @SUBSTRUCTURE 1 NAL 1 TEMP 0 **** **** 0 ROOT ```
j-wags commented 3 years ago

Update: This does indeed seem to be a bug. Without control of the underlying machinery, however, it will be hard to fix these cases. We're going to prune these and move forward, assuming that the parameters that we would would harvest from these are covered by other structures.

Details:

David Cerutti 11:58 From what I could tell, the mol2 writer has a problem with determining bond patterns, order is another issue. The coordinates of the mol2 files are all in the right places, same as the corresponding PDBs I generated, but the bonds from atom to atom are all off, hence the H+ ions everywhere. But, we need Amber to give us a mol2, we cannot bypass this with some other software, right? Our duct-tape fix was incorporating Amber in this way to get around other issues. If the problem is this confined, to particular residues, then perhaps we can prune them and keep going with other residues providing coverage?

Jeffrey Wagner 13:56 Thanks for looking into this. We do need a mol2, but I agree that it's possible that these particular structures aren't essential to the FF port. We can prune them for now and revisit this if we start seeing issues in energy validation for these.