Closed j-wags closed 8 months ago
Okay so something else I need to kick this off to generate some files, how were files like this:
MainChain/CYM/CYM.mol2
generated?
While we added CYM to the list of residues, it looks like we also need to generate some input files:
FileNotFoundError: [Errno 2] No such file or directory: 'MainChain/CYM/CYM.mol2'
It looks like GenerateTripeptides.sh
and GenerateDipeptides.sh
are used to create some files, ~so I will probally have to add CYM
here as well https://github.com/openforcefield/amber-ff-porting/blob/main/GenerateDipeptides.sh#L3-L7~
EDIT: Was looking at the wrong branch, this PR does that.
I'll now piviot to setting up CI, but if there is some low hanging fruit for me to run locally, I can do that as well.
That's right - IIRC GenerateDi/Tripeptides
makes the structures that are needed. In earlier versions there was one step where @dscerutti used a development branch of mdgx to make some structures but I think that's been replaced by calls to tools in stable releases. Also, I think we have to switch the pmemd
calls in that file to sander
.
Forgot to set OE_LICENSE: ${{ github.workspace }}/oe_license.txt
oops
Cool, now we are seeing the same error in CI that I have locally. It looks like the next step is to figure out either how to generate MainChain/CYM/CYM.mol2
or where to find it.
EDIT: I will see if GenerateDi/Tripeptides
creates it.
Okay new error:
Error opening unit 30: File "CYM.inpcrd" is missing or unreadable
mv: cannot stat 'mincrd': No such file or directory
GenerateDipeptides.sh: line 91: CYM.inpcrd: No such file or directory
/home/mmh/micromamba/envs/amber-ff-porting/bin/wrapped_progs/antechamber: Fatal Error!
Cannot open file (CYM.mol2) with mode (r).
No such file or directory
It looks like it exists:
$ ll MainChain/CYM/
-rw-rw-r-- 1 mmh mmh 457 Sep 28 12:21 ac.out
-rw-rw-r-- 1 mmh mmh 2.5K Sep 28 12:21 ANTECHAMBER_AC.AC
-rw-rw-r-- 1 mmh mmh 2.5K Sep 28 12:21 ANTECHAMBER_AC.AC0
-rw-rw-r-- 1 mmh mmh 2.5K Sep 28 12:21 ANTECHAMBER_BOND_TYPE.AC
-rw-rw-r-- 1 mmh mmh 2.5K Sep 28 12:21 ANTECHAMBER_BOND_TYPE.AC0
-rw-rw-r-- 1 mmh mmh 4.3K Sep 28 12:21 ATOMTYPE.INF
-rw-rw-r-- 1 mmh mmh 887 Sep 28 12:21 CYM.inpcrd
-rw-rw-r-- 1 mmh mmh 2.5K Sep 28 12:21 CYM.mol2
-rw-rw-r-- 1 mmh mmh 1.9K Sep 28 12:21 CYM.pdb
-rw-rw-r-- 1 mmh mmh 20K Sep 28 12:21 CYM.prmtop
-rw-rw-r-- 1 mmh mmh 12K Sep 28 12:21 leap.log
-rw-rw-r-- 1 mmh mmh 384 Sep 28 12:21 mdinfo
-rw-rw-r-- 1 mmh mmh 15K Sep 28 12:21 min.out
-rw-rw-r-- 1 mmh mmh 107 Sep 28 12:21 tleap2.in
-rw-rw-r-- 1 mmh mmh 1.4K Sep 28 12:21 tleap2.out
-rw-rw-r-- 1 mmh mmh 130 Sep 28 12:21 tleap.in
-rw-rw-r-- 1 mmh mmh 1.9K Sep 28 12:21 tleap.out
Will keep investigating
That error went away locally when I added set -euo pipefail
which makes no sense and also didn't show up on CI, so, so far so good!
Quick question for @mattwthompson RE: https://github.com/openforcefield/amber-ff-porting/releases/tag/0.0.3
Where did the ff14sb_0.0.2.offxml
and ff14sb_off_impropers_0.0.2.offxml
come from?
Running this pipeline, I get reduced.offxml
and reduced_off_impropers.offxml
as outputs, those are the files I should remove the stero from, right? (as well as fix the bond headers)
Where did the ff14sb_0.0.2.offxml and ff14sb_off_impropers_0.0.2.offxml come from?
Just renamed versions of reduced.offxml
Running this pipeline, I get reduced.offxml and reduced_off_impropers.offxml as outputs, those are the files I should remove the stero from, right? (as well as fix the bond headers)
Yup!
@j-wags for test_parameterize_tripeptides.py
and test_parameterize_dipeptides.py
the output looks like
MainChain/HID_HIP/HID_HIP
HarmonicBondForce 6.133906364440918 kJ/mol
HarmonicAngleForce 177.40892028808594 kJ/mol
PeriodicTorsionForce 96.68742370605469 kJ/mol
NonbondedForce -47.276123046875 kJ/mol
232.9541015625 kJ/mol
[H]C1=C(N(C(=N1)[H])[H])C([H])([H])[C@@]([H])(C(=O)N([H])[C@]([H])(C(=O)N([H])C([H])([H])[H])C([H])([H])C2=C(N(C(=[N+]2[H])[H])[H])[H])N([H])C(=O)C([H])([H])[H]
PeriodicTorsionForce 96.68801879882812 kJ/mol
NonbondedForce -47.2762451171875 kJ/mol
HarmonicBondForce 6.133906364440918 kJ/mol
HarmonicAngleForce 177.40921020507812 kJ/mol
232.954833984375 kJ/mol
187 amber dihedrals (15 impropers) and 187 off dihedrals
I skimmed through the code and while it does a lot of comparisons between the system, I wasn't sure if the output of these files has to be carefully inspected or there was some tool to plot things. Basically (to some degree of confidence) if the script runs without error, does that mean it is good? Or do we need to use another tool to make sure that the FF passes the benchmark.
As I recall, I had to check for energy discrepancies in the output using a script that I don't appear to have included here, and I can't find on my old computer. Sorry - That one probably needs to be rewritten.
No worries! What does that analysis look like?
NTerminal/PHE_GLN/PHE_GLN
HarmonicBondForce 3.907933235168457 kJ/mol
HarmonicAngleForce 12.979035377502441 kJ/mol
PeriodicTorsionForce 74.42961883544922 kJ/mol
NonbondedForce -325.94696044921875 kJ/mol
-234.63037109375 kJ/mol
[H]c1c(c(c(c(c1[H])[H])C([H])([H])[C@@]([H])(C(=O)N([H])[C@]([H])(C(=O)N([H])C([H])([H])[H])C([H])([H])C([H])([H])C(=O)N([H])[H])[N+]([H])([H])[H])[H])[H]
PeriodicTorsionForce 74.43064880371094 kJ/mol
NonbondedForce -325.9497375488281 kJ/mol
HarmonicBondForce 3.907932758331299 kJ/mol
HarmonicAngleForce 12.97904109954834 kJ/mol
-234.63214111328125 kJ/mol
169 amber dihedrals (12 impropers) and 169 off dihedrals
For a given block, do we just need to make sure the per energy term is within 1 kJ/mol? I am thinking ahead here and say I've got a magic wand and everything works when we add CYM
, and now we have this big file, what metric do we use to say "good enough"?
I think that flagging differences over 0.1 kJ/mol would be handy - And actually, we DID add a script that does that - SortResults.py
. There's a previous issue showing examples of its output here: https://github.com/openforcefield/amber-ff-porting/issues/23#issuecomment-685001962
It looks like we should expect it to highlight a few structures - mostly ASN and GLN. So that can be a control for our current outputs as well.
Awesome, I will add it to the pipeline. Looks like it takes ~5h30min to run on CI, that will likely go up since I noticed that the benchmark script is missing CYM
so it will take longer because of combinatorics.
Does CYM
belong in the MainChain
resname
list? or in the else:
block? Or both?
https://github.com/openforcefield/amber-ff-porting/blob/main/test_parameterize_dipeptides.py#L38-L52
Okay so for reasons errors get a bit obfuscated. I do want to fix that since that will make CI more functional, but while I work on it, the current real issue we have with adding CYM
is generating the N-Terminal and C-Terminal:
==> NTerminal/CYM/leap.log
> x = sequence { NCYM NME }
/home/mmh/micromamba/envs/amber-ff-porting/bin/teLeap: Fatal Error!
sequence: Illegal UNIT at position #1.
Exiting LEaP: Errors = 1; Warnings = 0; Notes = 0.
==> CTerminal/CYM/leap.log
> x = sequence { ACE CCYM }
Sequence: ACE
/home/mmh/micromamba/envs/amber-ff-porting/bin/teLeap: Fatal Error!
sequence: Illegal UNIT named: CCYM
Exiting LEaP: Errors = 1; Warnings = 0; Notes = 0.
So unless there is some special way to handle CYM (like we do with CYX), I suspect that CYM doesn't belong in the TRMRES
list.
Okay I will add some | tee tleap.out
instead of > tleap.out
, it will make the output more verbose when running Generate{Tr,Di}peptides.sh
but when we just adding a residue, I am not sure how else to do it.
@pbuslaev the tl;dr here is we can't build CYM with an N terminal.
To reproduce the error on CI:
Make a file called tleap.in
containing:
source leaprc.protein.ff14SB
x = sequence { NCYM NME }
set x box { 48.0 48.0 48.0 }
saveAmberParm x CYM.prmtop CYM.inpcrd
quit
Then run tleap -f tleap.in
Let me know if you have any questions.
@mikemhenry Thanks for setting up the CI. I checked amino acids in terminal groups in amber .lib
files and indeed CYM
is not part of those. So we should not add CYM
to TRMRES
Awesome! The CI can take awhile depending where it fails. I've got this all running locally as well so ping me if you hit an error that isn't easy to troubleshoot on the CI
I am getting this output for ARG_HIE
. Should I add this tripeptide to the list of malformed?
CTerminal/ARG_HIE/ARG_HIE
['CTerminal', 'ARG_HIE', 'ARG_HIE']
['HarmonicBondForce', '15.277278379009807', 'kJ/mol']
['HarmonicAngleForce', '116.9349552029777', 'kJ/mol']
['PeriodicTorsionForce', '103.24938218984182', 'kJ/mol']
['NonbondedForce', '-1167.615240217328', 'kJ/mol']
['-932.1536244454986', 'kJ/mol']
['[H]C1=C(N=C([N+]1[H])[H])C([H])([H])[C@@]([H])(C(=O)[O-])N([H])C(=O)[C@]([H])(C([H])([H])C([H])([H])C([H])([H])[N+](=C(N([H])[H])N([H])[H])[H])N([H])C(=O)C([H])([H])[H]']
['Partial', 'charge', 'sum', '(-1.6463341800942999e-09', 'e)', 'for', 'molecule', "'default_name'", '(SMILES', '[H]C1=C(N=C([N+]1[H])[H])C([H])([H])[C@@]([H])(C(=O)[O-])N([H])C(=O)[C@]([H])(C([H])([H])C([H])([H])C([H])([H])[N+](=C(N([H])[H])N([H])[H])[H])N([H])C(=O)C([H])([H])[H]', 'does', 'not', 'equal', 'formal', 'charge', 'sum', '(1.0', 'e).', 'To', 'override', 'this', 'error,', 'provide', 'the', "'allow_nonintegral_charges=True'", 'keyword', 'to', 'ForceField.create_openmm_system']
@j-wags who is the right person to ping for questions like this? :point_up:
I'm the right person to ask.
That's a tiny charge difference and can be ignored - This should be the case for anything under about 1e-6 elementary charge. Feel free to downgrade that from an error to a warning - maybe wrap it in a try/except, and if it gets the NonIntegralChargeError
, have it rerun create_openmm_system
with allow_nonintegral_charges = True
That is, don't add that to the malformed list.
Ok, I made the condition for creating a system a bit softer, so now all the tests pass. Now I move back to the full testing.
BTW, do we really need to run all validation? Is not it enough to run validation only for new added residues? This approach can significantly speed up the workflow.
BTW, do we really need to run all validation? Is not it enough to run validation only for new added residues? This approach can significantly speed up the workflow.
I am not sure the right way to detect just the newly added ones since there are some bash loops that generate everything but skip over the ones that already exist. While it is annoying to wait that long, I think it would be better to waste the validation time and be extra sure.
Once CI passes for this all check, I will download the artifacts and then do the parameter dedupe. Thank you so much for the help @pbuslaev
@mikemhenry @j-wags, I wonder if you had time to progress with generation new version of amber porting? Would be great to test it!
@pbuslaev Sorry I just forgot about this, it looks like it passed so what I will do now is download the artifacts and then do the dedupe and post the results. I'll ping Jeff at that point to take a look at the FF and if it all looks good then we should be good to ship it.
Once quick thing I noticed is under the benchmark results, we only have main chain with CYM, no C or N terminal, and no CYM_other_residue. If that is expected then that is fine, but I am wondering if we are missing benchmarking CYM_other_residue. This is all we have under Main Chain:
2023-10-05T21:46:07.1420489Z CYM
2023-10-05T21:46:07.1421315Z Bond: 2.5134 2.5134 -0.0000
2023-10-05T21:46:07.1422153Z Angl: 9.6107 9.6106 0.0000
2023-10-05T21:46:07.1423014Z Dihe: 39.6740 39.6741 -0.0001
2023-10-05T21:46:07.1423889Z Nonb: -147.3224 -147.3225 0.0000
Once quick thing I noticed is under the benchmark results, we only have main chain with CYM, no C or N terminal, and no CYM_other_residue. If that is expected then that is fine, but I am wondering if we are missing benchmarking CYM_other_residue. This is all we have under Main Chain:
2023-10-05T21:46:07.1420489Z CYM 2023-10-05T21:46:07.1421315Z Bond: 2.5134 2.5134 -0.0000 2023-10-05T21:46:07.1422153Z Angl: 9.6107 9.6106 0.0000 2023-10-05T21:46:07.1423014Z Dihe: 39.6740 39.6741 -0.0001 2023-10-05T21:46:07.1423889Z Nonb: -147.3224 -147.3225 0.0000
If I understand correctly CYM_other_residue
entries should appear after test_parameterize_tripeptides.py
call. In the original version, neither CYS nor CYX were included into the list of residues to test, so I did not add CYM as well. We can add those, if you think it is needed. As for C- and N-terminals, tleap
for some reason can not generate NCYM and NCYX, thus, we can not add those.
I am running the dedupe and we will soon have the FF with CYM
.
I am not sure how to test CYM since we couldn't add C- and N- terminals with tleap
and it doesn't look like we test/compare CYM in a dimer or trimer, so I am just concerned that we don't have a great way to test the FF.
I will post the "production" version of the FF once dedupe finishes (and I edit the bond header)
Just wanted to give a quick update: Tripeptide tests are still running locally for me, so far so good.
I've attached the new FF files with CYM. @pbuslaev please give these a test and let me know if they work. I added .zip
to the end of the file names to trick the really annoying issue where github cares about the file name extension (but not the actual file type, go figure).
@j-wags it looks like the like there is only one LibraryCharge
with CYM
(so no things like CTerminal-CYM_CYM
so I think we should be okay. Not sure what else to spot check.
<LibraryCharge smirks="[H:4][C:3]([C:9]=[O:10])([C:5]([H:6])([H:7])[S-:8])[N:1][H:2]" charge1="-0.4157 * elementary_charge" charge2="0.27190000000000003 * elementary_charge" cha rge3="-0.0351 * elementary_charge" charge4="0.0508 * elementary_charge" charge5="-0.2413 * elementary_charge" charge6="0.1122 * elementary_charge" charge7="0.1122 * elementary_charge" c harge8="-0.8843999989024437 * elementary_charge" charge9="0.5973000005487781 * elementary_charge" charge10="-0.5679000016463344 * elementary_charge" id="MainChain-CYM_CYM"></LibraryCharge>
Also including some plots from the notebook as well:
(I added .zip
to trick github, it isn't really zipped so just remove the .zip
extension from the file name)
ff14sb_off_impropers.0.0.4rc1.offxml.zip ff14sb_0.0.4rc1.offxml.zip
Let me know if you need anything else. If everything looks good then we can create a new release.
EDIT: I had a typo in the file names, it is fixed now
@pbuslaev have you had a chance to test the ff to see if it works for you?
Hi @mikemhenry. Sorry, last week was a bit busy. I will try my best to test it next week.
@pbuslaev no worries!
I can confirm that now I can generate system with negatively charged CYS
Awesome! Do you have any way to test if you get sensible results?
As for now, I can confirm that assigned parameters are consistent with amber14sb parameters (I manually checked charges, LJ parameters, and some bonds/angles/torsions). Also, I generated a protein with CYM and amber14 with gromacs. Then I minimized the protein, loaded it into interchange with the new force field and computed energy with get_gromacs_energies . Additionally I computed energies in GROMACS with the original force field. These are the results:
Type |
Bond | Angle | Torsion | Nonbonded |
---|---|---|---|---|
Interchange OpenMM | 151.34 | 755.73 | 3640.02 | -4270.85 |
GROMACS | 151.34 | 755.73 | 3637.16 | -4270.95 |
@pbuslaev if you have the time, could you add detailed=True
to both get_openmm_energies
and get_gromacs_energies
and share the results?
To be clear this is for my own curiosities, I'm not trying to get in the way of this PR getting across the finish line. In your last comment, it strikes me that the proper and improper torsions are not split out. I'm not sure if Interchange does a good job of handling these more detailed energy reports, and until now I've mostly been testing them with contrived cases.
@mattwthompson get_gromacs_energies
resulted in the error
---------------------------------------------------------------------------
InvalidEnergyError Traceback (most recent call last)
Cell In[166], line 1
----> 1 a_energies = get_gromacs_energies(i1,mdp = "default", detailed=True)
2 a_energies
File /tmp_mnt/filer1/unix/pbuslaev/conda_env/openff_dev/lib/python3.10/site-packages/openff/interchange/drivers/gromacs.py:85, in get_gromacs_energies(interchange, mdp, round_positions, detailed)
57 def get_gromacs_energies(
58 interchange: Interchange,
59 mdp: str = "auto",
60 round_positions: int = 8,
61 detailed: bool = False,
62 ) -> EnergyReport:
63 """
64 Given an OpenFF Interchange object, return single-point energies as computed by GROMACS.
65
(...)
83
84 """
---> 85 return _process(
86 _get_gromacs_energies(
87 interchange=interchange,
88 mdp=mdp,
89 round_positions=round_positions,
90 ),
91 detailed=detailed,
92 )
File /tmp_mnt/filer1/unix/pbuslaev/conda_env/openff_dev/lib/python3.10/site-packages/openff/interchange/drivers/gromacs.py:256, in _process(energies, detailed)
254 """Process energies from GROMACS into a standardized format."""
255 if detailed:
--> 256 return EnergyReport(energies=energies)
258 return EnergyReport(
259 energies={
260 "Bond": energies.get("Bond", 0.0 * kj_mol),
(...)
266 },
267 )
File /tmp_mnt/filer1/unix/pbuslaev/conda_env/openff_dev/lib/python3.10/site-packages/pydantic/v1/main.py:339, in BaseModel.__init__(__pydantic_self__, **data)
333 """
334 Create a new model by parsing and validating input data from keyword arguments.
335
336 Raises ValidationError if the input data cannot be parsed to form a valid model.
337 """
338 # Uses something other than `self` the first arg to allow "self" as a settable attribute
--> 339 values, fields_set, validation_error = validate_model(__pydantic_self__.__class__, data)
340 if validation_error:
341 raise validation_error
File /tmp_mnt/filer1/unix/pbuslaev/conda_env/openff_dev/lib/python3.10/site-packages/pydantic/v1/main.py:1076, in validate_model(model, input_data, cls)
1073 if check_extra:
1074 names_used.add(field.name if using_name else field.alias)
-> 1076 v_, errors_ = field.validate(value, values, loc=field.alias, cls=cls_)
1077 if isinstance(errors_, ErrorWrapper):
1078 errors.append(errors_)
File /tmp_mnt/filer1/unix/pbuslaev/conda_env/openff_dev/lib/python3.10/site-packages/pydantic/v1/fields.py:898, in ModelField.validate(self, v, values, loc, cls)
895 v, errors = self._validate_sequence_like(v, values, loc, cls)
897 if not errors and self.post_validators:
--> 898 v, errors = self._apply_validators(v, values, loc, cls, self.post_validators)
899 return v, errors
File /tmp_mnt/filer1/unix/pbuslaev/conda_env/openff_dev/lib/python3.10/site-packages/pydantic/v1/fields.py:1157, in ModelField._apply_validators(self, v, values, loc, cls, validators)
1155 for validator in validators:
1156 try:
-> 1157 v = validator(cls, v, values, self, self.model_config)
1158 except (ValueError, TypeError, AssertionError) as exc:
1159 return v, ErrorWrapper(exc, loc)
File /tmp_mnt/filer1/unix/pbuslaev/conda_env/openff_dev/lib/python3.10/site-packages/pydantic/v1/class_validators.py:304, in _generic_validator_cls.<locals>.<lambda>(cls, v, values, field, config)
302 return lambda cls, v, values, field, config: validator(cls, v, values=values, field=field, config=config)
303 elif args == set():
--> 304 return lambda cls, v, values, field, config: validator(cls, v)
305 elif args == {'values'}:
306 return lambda cls, v, values, field, config: validator(cls, v, values=values)
File /tmp_mnt/filer1/unix/pbuslaev/conda_env/openff_dev/lib/python3.10/site-packages/openff/interchange/drivers/report.py:51, in EnergyReport.validate_energies(cls, v)
49 for key, val in v.items():
50 if key not in _KNOWN_ENERGY_TERMS:
---> 51 raise InvalidEnergyError(f"Energy type {key} not understood.")
52 if not isinstance(val, unit.Quantity):
53 v[key] = FloatQuantity.validate_type(val)
InvalidEnergyError: Energy type Proper Dih. not understood.
@pbuslaev This is looking great! I think will have to wait until @j-wags gets back from vacation to cut an official release (this is wild speculation, someone else may be able to do it but he volunteered). That should give people more time to stress test this FF anyway before it ends up in https://github.com/openforcefield/openff-forcefields
I can cut releases in Jeff's absence - in this case I could use a refresher on what the requirements are. I wasn't following the discussion too closely, but I have a vague memory of the energies (save impropers) of earlier ports matching Amber's implementation extremely closely, which also seems to be the case here.
This would be deployed to https://github.com/openforcefield/openff-amber-ff-ports - the result of this change would be ff14sb_0.0.4.offxml
, right?
I can cut releases in Jeff's absence - in this case I could use a refresher on what the requirements are. I wasn't following the discussion too closely, but I have a vague memory of the energies (save impropers) of earlier ports matching Amber's implementation extremely closely, which also seems to be the case here.
This would be deployed to https://github.com/openforcefield/openff-amber-ff-ports - the result of this change would be
ff14sb_0.0.4.offxml
, right?
I guess also ff14sb_0.0.4_imporpers.offxml
should be added
Right, I should have added a wildcard or something in that filename, I was just trying to get at it being version 0.0.4 of the existing line
Yes and the two FF files I attached in https://github.com/openforcefield/amber-ff-porting/pull/44#issuecomment-1768661516 are production ready (just need the file name to indicate it is version 0.0.4
as I removed stereo stuff @
and de-duped params (the only part of the pipeline now that isn't automated).
I did modify the notebooks slightly on my end, but the only change was the name(s) of the inputs and the outputs, so this PR has all the code changes baked in, so it should be
At least that is all I can think of.
Awesome, I think I can take it over from here. I'll give everything a once-over and then work through that checklist!
Sorry to leave at the worst possible time for this, and thanks everyone for carrying this forward. I agree with everything done here so far, and agree this is ready for release following Mike's bullet points. Thanks a billion, @pbuslaev @mikemhenry and @mattwthompson!
@mikemhenry are these files ultimately generated in CI or locally? If the latter, could you export your conda environment (like before)? Or, if you have permissions, just upload directly to the 0.0.4 release.
@mattwthompson I will export my conda env since I took them from CI, but then did run them locally for the de-dupe, so that will be nice for people to have. It looks like I have permissions to update the release directly, so I will do that.
Continues from #42, but with access to an OE license since it's on a branch.