openmm / openmmforcefields

CHARMM and AMBER forcefields for OpenMM (with small molecule support)
http://openmm.org
Other
222 stars 78 forks source link

CHARMM: Handle patches #22

Closed jchodera closed 6 years ago

jchodera commented 7 years ago

Since the CHARMM forcefield only specified patch/residue compatibility with (barely) human-readable comments (see https://github.com/choderalab/openmm-forcefields/pull/1#issuecomment-283505234), we plan to create a YAML file to direct which ffxml compatibility tags are to be inserted.

The initial version will cover

The syntax will look something like

charmm-patches.yaml:
---
residues:
   ALA:
      - NTER
      - CTER

patches:
   DISU:
      - 1:CYS
      - 2:CYS

In future iterations, we will expand this compatibility list.

ChayaSt commented 7 years ago

@jchodera, I'm running into a problem when writing out the patches in the format that @peastman had documented here

This is what the patch looks like in CHARMM.

For the first two problems, we can probably devise a way to use the yaml file to find the template of the residues the patch will be patched to to figure out which atoms are changed rather than added and which bonds to remove, but that would mean using the yaml file in Parmed. Maybe @swails can offer some insight?

jchodera commented 7 years ago

@peastman : We hadn't realized that this was going to be so problematic when the <Patch> spec was being defined.

The most straightforward solution here would be to modify the OpenMM <Patch> spec to support a change in behavior when the patch is applied:

We discussed the alternatives---engineering a separate pipeline or trying to massively overhaul ParmEd---and think that it will set us back a few weeks if we need to go that route. We need to fix the <Patch> spec anyway because the ability to specify partial charge changes was accidentally omitted in the first version of the spec...

peastman commented 7 years ago

Since we're only including a handful of patches (NTER, CTER, DISU, and is that it?), what about just writing them by hand? That should be trivial to do. If we want to support a large set of them in the future, we can worry about getting ParmEd to generate them at that point.

jchodera commented 7 years ago

@peastman: There's a few issues with manual generation of patches:

@ChayaSt and I feel the best thing to do is to make these changes to the <Patch> spec now rather than be trapped by a problematic spec later. I don't think there's anything using the <Patch> function currently, and I'd be happy to make the changes in a PR.

peastman commented 7 years ago

The current spec doesn't include the ability to modify charges

It actually does. I may have forgotten to mention that in the documentation. Sorry if I did! If you specify charge= on your <AddAtom> and <ChangeAtom> tags, that should work correctly.

internal and external bonds in the original template to atoms deleted by the patch would automatically be deleted

It already does that too.

patch-specified atoms would be added if they don't exist in the original residue template, or their types are modified if they already exist

I do see how this would simplify the process of automatically converting the CHARMM files, and let's discuss ways of handling it. But since we're only talking about three patches that can trivially be converted by hand, this isn't actually blocking anything, is it?

ChayaSt commented 7 years ago

I do see how this would simplify the process of automatically converting the CHARMM files, and let's discuss ways of handling it. But since we're only talking about three patches that can trivially be converted by hand, this isn't actually blocking anything, is it?

It's actually more than 3 since I'm including the common termini (total of 8), specific termini for GLY and PRO (another 4), protonations and deprotonations (6), disulfide (1), and all phosphorylation (6). This is a total of 25. A bit much to do by hand...

peastman commented 7 years ago

Ok, let's try to do something automated then.

Could the YAML file list the atoms that are being changed rather than added? I'm just trying to figure out a way to avoid removing error checking from OpenMM. Adding a new atom and changing an existing atom are fairly different operations, and making the user specify in the force field XML which one they're doing could catch a lot of user errors. I want to keep that protection if possible.

jchodera commented 7 years ago

We don't actually have any way to know exactly which atoms and bonds should be deleted since the charmm patches do not provide this information, so there is no fundamentally superior way to check for errors in this case. What if the error checking was skipped only for patches that have a "behavior=charmm" attribute?

ChayaSt commented 7 years ago

Could the YAML file list the atoms that are being changed rather than added?

That can be tough because the CHARMM template doesn't differentiate. All it gives you is a list of atoms from which some are changed and some are added.

peastman commented 7 years ago

I know, you'd have to add that information yourself. For example, the NTER patch just lists six atoms: N, HT1, HT2, HT3, CA, and HA. It doesn't say which are added and which are changed. But you know that HT1, HT2, and HT3 are being added while N, CA, and HA are being changed. That's why I suggested putting it in the yaml file.

To put it differently: CHARMM really ought to provide that information, and it's a flaw in their design that it doesn't! OpenMM's design is better. I'd rather keep it that way, even if it means having to fill in the missing information by hand at conversion time, rather than force OpenMM to adopt the same design flaws that CHARMM has.

jchodera commented 7 years ago

Except we don't know that for certain. We do not have that information available, and the only way to know what atoms are deleted is to discern this from a particular combination of patch with residue that should be allowed. We also believe there are cases where the atoms that are deleted will be different for different target residues.

I understand you have issues with how CHARMM represents this information, but I also understand that you would like to make the CHARMM forcefield available in OpenMM. In the end, there is always a compromise that must be made, and we're suggesting a logical compromise that mimics the behavior of CHARMM is the right compromise here.

jchodera commented 7 years ago

To put it differently: CHARMM really ought to provide that information, and it's a flaw in their design that it doesn't! OpenMM's design is better. I'd rather keep it that way, even if it means having to fill in the missing information by hand at conversion time, rather than force OpenMM to adopt the same design flaws that CHARMM has.

Also, CHARMM clearly permits any of these combinations, meaning they are utterly valid in the CHARMM world. Who are we to say what is or isn't appropriate in the CHARMM world?

peastman commented 7 years ago

Except we don't know that for certain.

We do know that. Remember that the XML file has to explicitly specify what patches can be applied to what templates. They don't get combined in arbitrary ways, only in the ways we specifically enumerate.

A "patch" in OpenMM is something very different from a "patch" in CHARMM, even though we use the same name for both of them. In retrospect maybe we should have chosen a different name to avoid that confusion. A CHARMM patch is a set of rules for modifying residues. The user tells the program, "Apply this patch to this residue," and it makes the modifications. An OpenMM patch doesn't modify anything. The system already has whatever atoms and bonds it's going to have, and we don't change that. The patch is a rule for generating new templates for parameter assignment. The user doesn't tell it what patches to apply. Instead, it automatically figures out what patched template is needed to match a residue.

So even though the OpenMM "patches" are (mostly) generated from the information contained in CHARMM "patches", they're not the same thing. Sometimes we'll need to supply extra information. Sometimes there may not be a 1:1 correspondence between CHARMM patches and OpenMM patches. That's not a problem. But if we try to think of them as actually the same thing, that will lead us into confusion.

ChayaSt commented 7 years ago

We also believe there are cases where the atoms that are deleted will be different for different target residues.

An example of this it PRO. The N there is type N rather than NH1. While it has its own NTER and ACP patch, it doesn't have its own NNEU where the N is changed to an NH2 from either an NH1 for the other residues and from N for PRO.

peastman commented 7 years ago

That means we'll need to create one version of the patch for PRO, and a different one for all the other amino acids. Not a problem. The force field explicitly specifies what patches can be applied to what templates.

jchodera commented 7 years ago

Except we don't know that for certain. We do know that. Remember that the XML file has to explicitly specify what patches can be applied to what templates. They don't get combined in arbitrary ways, only in the ways we specifically enumerate.

No, we don't know that for CHARMM because the information is not provided.

What you suggest amounts to preenumerating all possible combinations of patches and residues, which is what the Patch tag was meant to avoid. I don't think this is a solution that will gracefully scale to the whole charmm forcefield. Someone may be able to hack it together manually with a few days of effort (we won't be doing that here), but we won't easily be able to extend this later.

The Patch scheme was implemented to make CHARMM work. There is an easy way to fix the implementation to make CHARMM work with minimal effort without compromising the OpenMM error checking for anything that isn't already permissible by CHARMM. I'm not sure why there is resistance to this.

We will have to circle back and schedule a discussion by Skype for Monday or Tuesday of next week to sort out remaining issues. Today is unfortunately packed.

peastman commented 7 years ago

What you suggest amounts to preenumerating all possible combinations of patches and residues, which is what the Patch tag was meant to avoid.

No, that is exactly what the patch tag requires. Please look at the spec (which you agreed to when it was implemented!) to see how it works. You're making claims that just aren't correct. You either have to include an <AllowPatch> inside the <Residue> tag, or else include an <ApplyToResidue> inside the <Patch> tag. Either way, the XML file explicitly enumerates every allowed combination of patch and template. If a combination isn't listed, it will never be used.

jchodera commented 7 years ago

@peastman : I think there's still a bit of confusion over what we're proposing to change. We can still provide and require this---it's necessary to prevent combinatorial explosion for OpenMM---but we are suggesting that the determination of which atoms are changed vs which atoms are added should be made automatically at the time of Patch application. If we don't do this, we will need to pre-enumerate all possible combinations and split out patches into multiple patches automatically before creating the Patch records in a manner that I don't think will scale.

You were worried that we are eliminating error checking by not explicitly specifying which atoms are added vs changed in the patch---information the CHARMM PRES doesn't provide---but @ChayaSt discovered CHARMM adds a different kind of sanity check: The net charge change. The PRES record provides the total charge change expected after application of the patch in the first line (such as -1.00 for the PRES CTER below). We could use this as a substitute sanity check.

PRES CTER        -1.00 ! standard C-terminus
GROUP                  ! use in generate statement
ATOM C    CC      0.34 !   OT2(-)
ATOM OT1  OC     -0.67 !  /
ATOM OT2  OC     -0.67 ! -C
DELETE ATOM O          !  \\
BOND C OT2             !   OT1
DOUBLE  C OT1
IMPR C CA OT2 OT1
ACCEPTOR OT1 C   
ACCEPTOR OT2 C   
IC N    CA   C    OT2   0.0000  0.0000  180.0000  0.0000  0.0000
IC OT2  CA   *C   OT1   0.0000  0.0000  180.0000  0.0000  0.0000
peastman commented 7 years ago

Yep, that's exactly what I'm proposing.

There's missing information in the CHARMM file. We need to fill that in. Do we do that at force field conversion time, or at system construction time?

In the first case, it happens in code that's run exactly once, when the force field is converted. We can check over the results carefully, validate them thoroughly, make sure everything is correct.

In the second case, it's code that runs every single time a user calls createSystem(). If there's a bug in that code, it might only show up in special circumstances. It will affect not just this force field we're creating now, but every force field with patches that will ever be created in the future by any user. It removes safety checks that would otherwise have caught mistakes those users make.

It seems pretty obvious to me which one is the better approach.

If we don't do this, we will need to pre-enumerate all possible combinations

You have to do that regardless. The XML file explicitly enumerates all allowed combinations. That's remains true either way.

jchodera commented 7 years ago

You have to do that regardless. The XML file explicitly enumerates all allowed combinations. That's remains true either way.

You're asking us to explicitly enumerate not only all allowed combinations, but also break these up into classes that require different sets of atom additions and atom alterations. Right now, this can only be done by manually inspecting each (PRES, RESI) pair for something like 25 patches.

Let's rule out the manual approach. Even if it was something reasonable this time, it won't scale in the future.

Let me look into what ParmEd might do to facilitate the automated enumeration of combinations and their segregation into sub-patch types depending on which sets of atoms are added vs altered.

If you're concerned about safety checks, I presume we'll also implement the total charge change safety check into OpenMM as well?

jchodera commented 7 years ago

@ChayaSt : Does ParmEd handle the storage and writing of patches yet, or do we need to add that as well? I tried checking the list params_omm.residues from the conversion script, but it looks like none of the residues loaded are patches.

swails commented 7 years ago

ParmEd does "handle" patches (see here), but currently it just keeps a list of atoms that need deleting in addition to the information stored in a residue template.

That was the extent of how I was using patches at the time, so if it needs to be extended (as I suspect it does), then it can be.

jchodera commented 7 years ago

Thanks, @swails!

Any idea where the .delete list is actually populated with atoms the patch specifies should be deleted? I don't see that handled anywhere in the CHARMM parameter file reader.

We're talking with @peastman today to figure out exactly what needs to be done with patch writing for OpenMM, then will open a PR with the proposed changes!

jchodera commented 7 years ago

@swails: One more question: In CHARMM, patches (PRES) contain total residue charge that turns out to be important as a sanity check. The RESI residues contain this as well, but the total charge should just be a sum of the partial atomic charges (unlike in PRES). If we add a total_charge attribute, would you prefer we add that to ResidueTemplate or PatchTemplate?

jchodera commented 7 years ago

@peastman : I think we can, within ParmEd, enumerate all "safe" combinations of patches using the total charge as a sanity check to exclude impermissible combinations. So listing residue compatibilities shouldn't be problematic.

I think we have to decide whether to add to ForceField an option to allow the CHARMM behavior of either adding or changing atoms to a residue when a patch is applied. If we don't, it significantly increases the complexity of what we have to do in ParmEd unless we actually create one new patch for every (residue,patch) combination, which is what we were hoping to avoid in the first place. We can discuss on today's call!

ChayaSt commented 7 years ago

Any idea where the .delete list is actually populated with atoms the patch specifies should be deleted? I don't see that handled anywhere in the CHARMM parameter file reader.

I added that here

jchodera commented 7 years ago

When did that get merged into parmed?

ChayaSt commented 7 years ago

I didn't merge that yet because more changes are probably needed.

jchodera commented 7 years ago

Ah, OK. Can you at least open a WIP PR so we can see what changes need to be merged in?

swails commented 7 years ago

Any idea where the .delete list is actually populated with atoms the patch specifies should be deleted?

Seems that I hadn't added that yet... It would go in the PATCH processing section of the rtf parsing in the charmm/parameters.py file. IIRC, I added that class to help with the CHARMM conversions, but activity on that front died away for a while. It's been quite some time since I've worked on that and can't remember all of the details. It smells very much like a WIP, though.

If we add a total_charge attribute, would you prefer we add that to ResidueTemplate or PatchTemplate?

Add it to both, and make sure that in PatchTemplate, the behavior is overridden to reflect what should happen in patches.

jchodera commented 7 years ago

I've been working on the code to read, process, and write patches within ParmEd, and have a few observations so far:

Syntax (command level)

    PATCh <pres-name> segid1 resid1 [, segid2 resid2 [,...
                                     [, segid9 resid9]...]]
                                      [SORT]
                                       [SETUp]
                                        [WARN]

Syntax (corresponding patch residue in RTF)

    PRES <pres-name>

    [GROUp]
    [ATOM  <I><atomname>  <parameter type>   <charge> ]
    [DELEte ATOM <I><atomname>]

    [ [DELEte] BOND <I1> <I2> ]
    [ [DELEte] ANGLe <I1> <I2> <I3> ]
    [ [DELEte] DIHEdral <I1> <I2> <I3> <I4> ]
    [ [DELEte] IMPRoper <I1> <I2> <I3> <I4> ]
    [ [DELEte] DONOr  [<I1>] <I2> [[<I3> [<I4>]] ]
    [ [DELEte] ACCEptor  <I1> [ <I2> [ <I3> ]] ]

    [ IC  <I1> <I2> [*]<I3> <I4>   real real real real real ]
    [ DELEte IC <I1> <I2> [*]<I3> <I4> ]

 where I1, I2, I3, I4 refer to <I><atomname>.

Rules governing the patch procedure:

1) If an atom is being added via a PATCH at least one or more atoms already existing in the residue to which the patch is being added must be included in the PRES with an ATOM statement. Unless this(these) atoms are deleted using the DELEte ATOM command internal terms associated with this atom which are already present in the residue should NOT be included in the PRES.

2) if no is specified before the patch procedure assumes that the atom should be in residue (segid1 resid1).

3) a '-', '+', '#' as a first letter in tries to locate or add the atom in the previous, next, next of the next, residue of residue (segid resid), respectively.

4) GROUP brackets in a patch residue have highest priority.

5) If no GROUP is specified, the group numbers of referenced, already existing atoms remain unchanged. Added atoms are placed in the last group of the referenced residue.

6) A GROUP statement in a patch residue CAN enclose atoms in different referenced residues. However, if there is a conflict between sequential residue AND group boundaries new residues MIGHT be created with resid's and segid's referring to the referenced residues. These cases are indicated by a message from MAPIC that a negative number of residues were created. The user has to check the PSF explicitly to decide whether the modifications done by PATCH are appropriate.

7) Along with the PSF the coordinates, comparision coordinates, harmonic constraints, fixed atom list, internal coordinates (IC) are mapped correctly.

8) THERE IS NO MAP OF NBONDS, HBONDS, SHAKE, DYNAMICS ETC. THE ATOMNUMBERS ARE CHANGED.

9) Any bond, angle, etc referring to deleted atoms is itself deleted. The bond, angle, etc lists are compressed.

10)Even if the AUTOgenerate ANGLe and/or DIHEdral option has been invoked new angles and/or dihedrals have to be included in the PRES when that particular patch is being called after the GENErate statement. The angles and/or dihedrals will be generated automatically for any patch which is called in the GENErate statement following the FIRSt or LAST statements. NOTE: If angles and dihedrals are present in a PRES which is called in a GENErate statement in which AUTOgenerate ANGLes and/or DIHEdrals is being used those angles and/or dihedrals will be invoked twice in the PSF and, thus, be included twice when the energy is calculated.

The AUTOgenerate command (next) can be used to circumvent the above problems, and removes the need for specifying angles and dihedrals as part of a PRES definition.

* While the `DELETE` keyword can in principle apply to each of `ATOM, BOND, ANGLe, DIHEdral, IMPRoper, DONOr, ACCEptor`, the CHARMM36 parameter set only seems to make use of `ATOM` (235 instances), `IMPR` (36 instances), `ANGL` (2 instances), `DIHE` (2 instances), and `ACCE` (2 instances). Surprisingly, `BOND` is not used at all, instead relying on the rule above:

9) Any bond, angle, etc referring to deleted atoms is itself deleted.

* There are a surprising number of `DELETE IMPR` cards that delete impropers (all in [`top_all36_cgenff.rtf`](https://github.com/choderalab/openmm-forcefields/blob/master/charmm/toppar/top_all36_cgenff.rtf)), which throws yet another wrench in our plans to handle impropers (#33). Implications of this are currently unknown, but here is an example of the cards:

PRES AMGA 0.00 ! C3H4O2 cacha ! patch combination: ! core residue Glutamic Acid CDCA Amide (GA) >> Alpha-Methyl Glu Acid CDCA Amide

                    !                                             OA1
                    !                                             ||
                    !                                       O24   CA--OA2--CH3(M)
                    !                                       ||    |
                    !                   OH     Me21   C22   C24   CC1   CC3   OG1(-0.5)

ATOM CC1 CG311 0.17 ! | \ / \ / \ / \ / \ / ATOM HC1 HGA1 0.09 ! C12 Me18 C20 C23 NH CC2 CG ATOM CA CG2O2 0.90 ! / \ | / \ ATOM OA1 OG2D1 -0.63 ! C11 C13---C17 OG2(-0.5) ATOM OA2 OG302 -0.49 ! Me19 | | | ATOM CM CG331 -0.31 ! C1 | C9 C14 C16 ATOM HM1 HGA3 0.09 ! / |/ \ / \ / ATOM HM2 HGA3 0.09 ! C2 C10 C8 C15 ATOM HM3 HGA3 0.09 ! | | | ! C3 C5 C7 ! / \ / \ / \ ! HO C4 C6 OH

                    !              Alpha-Methyl Glutamic Acid CDCA Amide

BOND OA2 CM BOND CM HM1 CM HM2 CM HM3 DELETE IMPR CA OA2 OA1 CC1 IMPR CA CC1 OA1 OA2

IC CC1 CA OA2 CM 1.5285 111.09 -178.91 115.08 1.4371 IC HM1 CM OA2 CA 1.1113 109.28 179.56 115.08 1.3429 IC HM2 CM OA2 CA 1.1136 110.99 60.48 115.08 1.3429 IC HM3 CM OA2 CA 1.1135 110.99 -61.17 115.08 1.3429

jchodera commented 6 years ago

@peastman : I've finally nearly finished all the functionality needed in ParmEd with this PR: https://github.com/jchodera/ParmEd/pull/1

The automatic scheme for determining which single-residue patches are compatible with which residues does seem to be working in the simple tests. For example for the CHARMM36 protein residue template file:

openmm_params has 23 residues and 23 patches
Determining valid patch combinations...
    GLUP : ['ALA', 'GLY', 'VAL', 'ALAD']
     ACP : ['GLU', 'ALA', 'GLN', 'HSE', 'CYS', 'PRO', 'MET', 'LEU', 'GLY', 'VAL', 'ILE', 'ASN', 'TYR', 'LYS', 'ASP', 'SER', 'THR', 'HSD', 'TRP', 'HSP', 'ALAD', 'PHE', 'ARG']
    CTER : []
     CT2 : ['GLU', 'ALA', 'GLN', 'HSE', 'CYS', 'PRO', 'MET', 'LEU', 'GLY', 'VAL', 'ILE', 'ASN', 'TYR', 'LYS', 'ASP', 'SER', 'THR', 'HSD', 'TRP', 'HSP', 'ALAD', 'PHE', 'ARG']
    GLYP : []
    NNEU : []
    LINK : ['GLU', 'ALA', 'GLN', 'HSE', 'CYS', 'PRO', 'MET', 'LEU', 'GLY', 'VAL', 'ILE', 'ASN', 'TYR', 'LYS', 'ASP', 'SER', 'THR', 'HSD', 'TRP', 'HSP', 'ALAD', 'PHE', 'ARG']
    LIG3 : ['GLU', 'ALA', 'GLN', 'HSE', 'CYS', 'PRO', 'MET', 'LEU', 'GLY', 'VAL', 'ILE', 'ASN', 'TYR', 'LYS', 'ASP', 'SER', 'THR', 'HSD', 'TRP', 'HSP', 'ALAD', 'PHE', 'ARG']
    CNEU : []
    LIG1 : ['GLU', 'ALA', 'GLN', 'HSE', 'CYS', 'PRO', 'MET', 'LEU', 'GLY', 'VAL', 'ILE', 'ASN', 'TYR', 'LYS', 'ASP', 'SER', 'THR', 'HSD', 'TRP', 'HSP', 'ALAD', 'PHE', 'ARG']
    ASPP : ['GLU', 'GLY']
     HS2 : ['HSP']
     CTP : []
    LIG2 : ['GLU', 'ALA', 'GLN', 'HSE', 'CYS', 'PRO', 'MET', 'LEU', 'GLY', 'VAL', 'ILE', 'ASN', 'TYR', 'LYS', 'ASP', 'SER', 'THR', 'HSD', 'TRP', 'HSP', 'ALAD', 'PHE', 'ARG']
    ACED : ['GLU', 'ALA', 'GLN', 'HSE', 'CYS', 'PRO', 'MET', 'LEU', 'GLY', 'VAL', 'ILE', 'ASN', 'TYR', 'LYS', 'ASP', 'SER', 'THR', 'HSD', 'TRP', 'HSP', 'ALAD', 'PHE', 'ARG']
    ACPD : ['GLU', 'ALA', 'GLN', 'HSE', 'CYS', 'PRO', 'MET', 'LEU', 'GLY', 'VAL', 'ILE', 'ASN', 'TYR', 'LYS', 'ASP', 'SER', 'THR', 'HSD', 'TRP', 'HSP', 'ALAD', 'PHE', 'ARG']
     ACE : ['GLU', 'ALA', 'GLN', 'HSE', 'CYS', 'PRO', 'MET', 'LEU', 'GLY', 'VAL', 'ILE', 'ASN', 'TYR', 'LYS', 'ASP', 'SER', 'THR', 'HSD', 'TRP', 'HSP', 'ALAD', 'PHE', 'ARG']
     CT3 : ['GLU', 'ALA', 'GLN', 'HSE', 'CYS', 'PRO', 'MET', 'LEU', 'GLY', 'VAL', 'ILE', 'ASN', 'TYR', 'LYS', 'ASP', 'SER', 'THR', 'HSD', 'TRP', 'HSP', 'ALAD', 'PHE', 'ARG']
     LSN : ['LYS']
    DISU : []
     CT1 : []
    NTER : []
    PROP : []

The problem I'm running into now is the case I was worried about on our last Skype call discussing the patch issue specifically: OpenMM requires I discriminate between AddAtom and ChangeAtom, while the CHARMM patches do not provide this information. Instead, I need some way to determine this automatically.

I have a implemented a ResidueTemplate.apply_patch() method that is used to help test if patches are compatible. I could test every valid patch combination listed above to discriminate which atoms are added vs which atoms are modified for all possible combinations, but what should I do if the patch atom is added for some residues and modified for others? Or is that impossible, given that I'm also checking for the patched residue to also have integral charge?

jchodera commented 6 years ago

It looks like some patch compatibility is also erroneously being picked up by this scheme, such as the GLUP patch (protonated glutamic acid) perceived as being compatible with ALA, GLY, and VAL (but surprisingly not GLU):

    GLUP : ['ALA', 'GLY', 'VAL', 'ALAD']

I suppose I should also be checking that there is a complete spanning tree of bonds so that the residue cannot be two disjoint sets of atoms.

jchodera commented 6 years ago

Correction: After also making sure the added bonds refer to atoms that are present, I now have this list of compatible patches:

    ACED : ['ALAD', 'SER', 'ARG', 'GLN', 'TYR', 'LEU', 'GLY', 'TRP', 'VAL', 'ASP', 'PRO', 'ILE', 'ASN', 'LYS', 'CYS', 'GLU', 'HSD', 'MET', 'THR', 'ALA', 'PHE', 'HSE', 'HSP']
    LINK : ['ALAD', 'SER', 'ARG', 'GLN', 'TYR', 'LEU', 'GLY', 'TRP', 'VAL', 'ASP', 'PRO', 'ILE', 'ASN', 'LYS', 'CYS', 'GLU', 'HSD', 'MET', 'THR', 'ALA', 'PHE', 'HSE', 'HSP']
    ASPP : ['GLY', 'ASP', 'ILE', 'ASN']
     CT1 : ['SER', 'ARG', 'GLN', 'TYR', 'LEU', 'TRP', 'VAL', 'ASP', 'ILE', 'ASN', 'LYS', 'CYS', 'GLU', 'HSD', 'MET', 'THR', 'ALA', 'PHE', 'HSE', 'HSP']
     HS2 : ['ARG', 'PRO', 'HSD', 'HSP']
     CT2 : ['ALAD', 'SER', 'ARG', 'GLN', 'TYR', 'LEU', 'GLY', 'TRP', 'VAL', 'ASP', 'PRO', 'ILE', 'ASN', 'LYS', 'CYS', 'GLU', 'HSD', 'MET', 'THR', 'ALA', 'PHE', 'HSE', 'HSP']
    CNEU : ['SER', 'ARG', 'GLN', 'TYR', 'LEU', 'GLY', 'TRP', 'VAL', 'ASP', 'PRO', 'ILE', 'ASN', 'LYS', 'CYS', 'GLU', 'HSD', 'MET', 'THR', 'ALA', 'PHE', 'HSE', 'HSP']
    GLYP : ['GLY']
     ACP : ['ALAD', 'SER', 'ARG', 'GLN', 'TYR', 'LEU', 'GLY', 'TRP', 'VAL', 'ASP', 'PRO', 'ILE', 'ASN', 'LYS', 'CYS', 'GLU', 'HSD', 'MET', 'THR', 'ALA', 'PHE', 'HSE', 'HSP']
    LIG3 : ['ALAD', 'SER', 'ARG', 'GLN', 'TYR', 'LEU', 'GLY', 'TRP', 'VAL', 'ASP', 'PRO', 'ILE', 'ASN', 'LYS', 'CYS', 'GLU', 'HSD', 'MET', 'THR', 'ALA', 'PHE', 'HSE', 'HSP']
    GLUP : ['ALAD', 'GLN', 'GLY', 'VAL', 'PRO', 'GLU', 'ALA']
    PROP : ['PRO']
    DISU : []
     ACE : ['ALAD', 'SER', 'ARG', 'GLN', 'TYR', 'LEU', 'GLY', 'TRP', 'VAL', 'ASP', 'PRO', 'ILE', 'ASN', 'LYS', 'CYS', 'GLU', 'HSD', 'MET', 'THR', 'ALA', 'PHE', 'HSE', 'HSP']
     CT3 : ['ALAD', 'SER', 'ARG', 'GLN', 'TYR', 'LEU', 'GLY', 'TRP', 'VAL', 'ASP', 'PRO', 'ILE', 'ASN', 'LYS', 'CYS', 'GLU', 'HSD', 'MET', 'THR', 'ALA', 'PHE', 'HSE', 'HSP']
    LIG2 : ['ALAD', 'SER', 'ARG', 'GLN', 'TYR', 'LEU', 'GLY', 'TRP', 'VAL', 'ASP', 'PRO', 'ILE', 'ASN', 'LYS', 'CYS', 'GLU', 'HSD', 'MET', 'THR', 'ALA', 'PHE', 'HSE', 'HSP']
     CTP : ['SER', 'ARG', 'GLN', 'TYR', 'LEU', 'GLY', 'TRP', 'VAL', 'ASP', 'PRO', 'ILE', 'ASN', 'LYS', 'CYS', 'GLU', 'HSD', 'MET', 'THR', 'ALA', 'PHE', 'HSE', 'HSP']
     LSN : ['LYS']
    ACPD : ['ALAD', 'SER', 'ARG', 'GLN', 'TYR', 'LEU', 'GLY', 'TRP', 'VAL', 'ASP', 'PRO', 'ILE', 'ASN', 'LYS', 'CYS', 'GLU', 'HSD', 'MET', 'THR', 'ALA', 'PHE', 'HSE', 'HSP']
    NTER : ['SER', 'ARG', 'GLN', 'TYR', 'LEU', 'TRP', 'VAL', 'ASP', 'ILE', 'ASN', 'LYS', 'CYS', 'GLU', 'HSD', 'MET', 'THR', 'ALA', 'PHE', 'HSE', 'HSP']
    LIG1 : ['ALAD', 'SER', 'ARG', 'GLN', 'TYR', 'LEU', 'GLY', 'TRP', 'VAL', 'ASP', 'PRO', 'ILE', 'ASN', 'LYS', 'CYS', 'GLU', 'HSD', 'MET', 'THR', 'ALA', 'PHE', 'HSE', 'HSP']
    CTER : ['SER', 'ARG', 'GLN', 'TYR', 'LEU', 'GLY', 'TRP', 'VAL', 'ASP', 'PRO', 'ILE', 'ASN', 'LYS', 'CYS', 'GLU', 'HSD', 'MET', 'THR', 'ALA', 'PHE', 'HSE', 'HSP']
    NNEU : ['SER', 'ARG', 'GLN', 'TYR', 'LEU', 'TRP', 'VAL', 'ASP', 'ILE', 'ASN', 'LYS', 'CYS', 'GLU', 'HSD', 'MET', 'THR', 'ALA', 'PHE', 'HSE', 'HSP']
jchodera commented 6 years ago

@peastman: Using just the first residue from the "compatible residues" list to figure out which atoms are added/modified and which bonds are deleted, I get the following complete charmm.xml file after a 20-minute runtime, weighing in around 10MB: charmm36.xml.zip This should include the lipids as well. Could you give this a try and see if it works for you?

Also, any feedback on how to further prune the compatible residue lists is appreciated!

I still have to check that the bond graph has no disconnected components after applying the patch. This may prune things further.

jchodera commented 6 years ago

For the simple case of just processing top_all36_prot.rtp above, adding a bond graph connectivity check seems to cull way more patches than I would expect:

    LIG2 : ['ILE', 'HSP', 'ALAD', 'LYS', 'MET', 'LEU', 'PRO', 'PHE', 'VAL', 'GLN', 'ALA', 'HSE', 'GLY', 'TRP', 'GLU', 'ARG', 'THR', 'ASP', 'TYR', 'CYS', 'SER', 'HSD', 'ASN']
    ASPP : ['ASP']
    PROP : ['PRO']
     ACE : []
     CT2 : ['ILE', 'HSP', 'LYS', 'MET', 'LEU', 'PRO', 'PHE', 'VAL', 'GLN', 'ALA', 'HSE', 'GLY', 'TRP', 'GLU', 'ARG', 'THR', 'ASP', 'TYR', 'CYS', 'SER', 'HSD', 'ASN']
    GLYP : ['GLY']
    LIG3 : ['ILE', 'HSP', 'ALAD', 'LYS', 'MET', 'LEU', 'PRO', 'PHE', 'VAL', 'GLN', 'ALA', 'HSE', 'GLY', 'TRP', 'GLU', 'ARG', 'THR', 'ASP', 'TYR', 'CYS', 'SER', 'HSD', 'ASN']
     HS2 : ['HSP', 'HSD']
    DISU : []
    NTER : ['ILE', 'HSP', 'LYS', 'MET', 'LEU', 'PHE', 'VAL', 'GLN', 'ALA', 'HSE', 'TRP', 'GLU', 'ARG', 'THR', 'ASP', 'TYR', 'CYS', 'SER', 'HSD', 'ASN']
    LIG1 : ['ILE', 'HSP', 'ALAD', 'LYS', 'MET', 'LEU', 'PRO', 'PHE', 'VAL', 'GLN', 'ALA', 'HSE', 'GLY', 'TRP', 'GLU', 'ARG', 'THR', 'ASP', 'TYR', 'CYS', 'SER', 'HSD', 'ASN']
     CT1 : ['ILE', 'HSP', 'LYS', 'MET', 'LEU', 'PHE', 'VAL', 'GLN', 'ALA', 'HSE', 'TRP', 'GLU', 'ARG', 'THR', 'ASP', 'TYR', 'CYS', 'SER', 'HSD', 'ASN']
     LSN : ['LYS']
     CTP : ['ILE', 'HSP', 'LYS', 'MET', 'LEU', 'PRO', 'PHE', 'VAL', 'GLN', 'ALA', 'HSE', 'GLY', 'TRP', 'GLU', 'ARG', 'THR', 'ASP', 'TYR', 'CYS', 'SER', 'HSD', 'ASN']
    NNEU : ['ILE', 'HSP', 'LYS', 'MET', 'LEU', 'PHE', 'VAL', 'GLN', 'ALA', 'HSE', 'TRP', 'GLU', 'ARG', 'THR', 'ASP', 'TYR', 'CYS', 'SER', 'HSD', 'ASN']
     ACP : []
     CT3 : ['ILE', 'HSP', 'LYS', 'MET', 'LEU', 'PRO', 'PHE', 'VAL', 'GLN', 'ALA', 'HSE', 'GLY', 'TRP', 'GLU', 'ARG', 'THR', 'ASP', 'TYR', 'CYS', 'SER', 'HSD', 'ASN']
    CTER : ['ILE', 'HSP', 'LYS', 'MET', 'LEU', 'PRO', 'PHE', 'VAL', 'GLN', 'ALA', 'HSE', 'GLY', 'TRP', 'GLU', 'ARG', 'THR', 'ASP', 'TYR', 'CYS', 'SER', 'HSD', 'ASN']
    GLUP : ['GLU']
    LINK : ['ILE', 'HSP', 'ALAD', 'LYS', 'MET', 'LEU', 'PRO', 'PHE', 'VAL', 'GLN', 'ALA', 'HSE', 'GLY', 'TRP', 'GLU', 'ARG', 'THR', 'ASP', 'TYR', 'CYS', 'SER', 'HSD', 'ASN']
    ACPD : []
    ACED : []
    CNEU : ['ILE', 'HSP', 'LYS', 'MET', 'LEU', 'PRO', 'PHE', 'VAL', 'GLN', 'ALA', 'HSE', 'GLY', 'TRP', 'GLU', 'ARG', 'THR', 'ASP', 'TYR', 'CYS', 'SER', 'HSD', 'ASN']

For example, I'd expect the ACE patch to be compatible with most amino acids, but it is rejected from all because it does not result in a connected bond graph.

jchodera commented 6 years ago

It looks like what is happening is that the patch bonds the CY atom to the N of the residue being patched (which is labeled as the head by parmed):

PRES ACE          0.00 ! acetylated N-terminus
                       ! do NOT use to create dipeptides, see ACED
GROUP                  ! use in generate statement
ATOM CAY  CT3    -0.27 !
ATOM HY1  HA3     0.09 ! HY1 HY2 HY3
ATOM HY2  HA3     0.09 !    \ | /
ATOM HY3  HA3     0.09 !     CAY
GROUP                  !      |
ATOM CY   C       0.51 !      CY=OY
ATOM OY   O      -0.51 !      |
                       !
BOND CY CAY CY N CAY HY1 CAY HY2 CAY HY3
DOUBLE OY CY  

Because parmed can only track bonds between atoms, treating these patches may require a major reworking of how parmed handles bonds, which could be highly problematic. I'm trying to think of a simpler way to handle these cases for patches.

jchodera commented 6 years ago

I think I've found a work-around that only impacts PatchTemplates, and doesn't touch the regular way parmed handles residues. The resulting list of compatible patches now looks plausible:

    LINK : []
     HS2 : ['HSD', 'HSP']
     ACE : ['LEU', 'HSE', 'PRO', 'TYR', 'HSD', 'HSP', 'THR', 'GLN', 'ASN', 'TRP', 'ILE', 'GLY', 'ASP', 'CYS', 'VAL', 'MET', 'ARG', 'ALA', 'SER', 'PHE', 'LYS', 'GLU']
    PROP : ['PRO']
    DISU : []
    LIG1 : []
    GLYP : ['GLY']
    LIG2 : []
    ACED : ['LEU', 'HSE', 'PRO', 'TYR', 'HSD', 'HSP', 'THR', 'GLN', 'ASN', 'TRP', 'ILE', 'GLY', 'ASP', 'CYS', 'VAL', 'MET', 'ARG', 'ALA', 'SER', 'PHE', 'LYS', 'GLU']
     CTP : ['LEU', 'HSE', 'PRO', 'TYR', 'HSD', 'HSP', 'THR', 'GLN', 'ASN', 'TRP', 'ILE', 'GLY', 'ASP', 'CYS', 'VAL', 'MET', 'ARG', 'ALA', 'SER', 'PHE', 'LYS', 'GLU']
    NNEU : ['LEU', 'HSE', 'TYR', 'HSD', 'HSP', 'THR', 'GLN', 'ASN', 'TRP', 'ILE', 'ASP', 'CYS', 'VAL', 'MET', 'ARG', 'ALA', 'SER', 'PHE', 'LYS', 'GLU']
    CTER : ['LEU', 'HSE', 'PRO', 'TYR', 'HSD', 'HSP', 'THR', 'GLN', 'ASN', 'TRP', 'ILE', 'GLY', 'ASP', 'CYS', 'VAL', 'MET', 'ARG', 'ALA', 'SER', 'PHE', 'LYS', 'GLU']
     LSN : ['LYS']
     CT3 : ['LEU', 'HSE', 'PRO', 'TYR', 'HSD', 'HSP', 'THR', 'GLN', 'ASN', 'TRP', 'ILE', 'GLY', 'ASP', 'CYS', 'VAL', 'MET', 'ARG', 'ALA', 'SER', 'PHE', 'LYS', 'GLU']
     CT2 : ['LEU', 'HSE', 'PRO', 'TYR', 'HSD', 'HSP', 'THR', 'GLN', 'ASN', 'TRP', 'ILE', 'GLY', 'ASP', 'CYS', 'VAL', 'MET', 'ARG', 'ALA', 'SER', 'PHE', 'LYS', 'GLU']
     ACP : ['LEU', 'HSE', 'PRO', 'TYR', 'HSD', 'HSP', 'THR', 'GLN', 'ASN', 'TRP', 'ILE', 'GLY', 'ASP', 'CYS', 'VAL', 'MET', 'ARG', 'ALA', 'SER', 'PHE', 'LYS', 'GLU']
    LIG3 : []
    ASPP : ['ASP']
    ACPD : ['LEU', 'HSE', 'PRO', 'TYR', 'HSD', 'HSP', 'THR', 'GLN', 'ASN', 'TRP', 'ILE', 'GLY', 'ASP', 'CYS', 'VAL', 'MET', 'ARG', 'ALA', 'SER', 'PHE', 'LYS', 'GLU']
    CNEU : ['LEU', 'HSE', 'PRO', 'TYR', 'HSD', 'HSP', 'THR', 'GLN', 'ASN', 'TRP', 'ILE', 'GLY', 'ASP', 'CYS', 'VAL', 'MET', 'ARG', 'ALA', 'SER', 'PHE', 'LYS', 'GLU']
    GLUP : ['GLU']
    NTER : ['LEU', 'HSE', 'TYR', 'HSD', 'HSP', 'THR', 'GLN', 'ASN', 'TRP', 'ILE', 'ASP', 'CYS', 'VAL', 'MET', 'ARG', 'ALA', 'SER', 'PHE', 'LYS', 'GLU']
     CT1 : ['LEU', 'HSE', 'TYR', 'HSD', 'HSP', 'THR', 'GLN', 'ASN', 'TRP', 'ILE', 'ASP', 'CYS', 'VAL', 'MET', 'ARG', 'ALA', 'SER', 'PHE', 'LYS', 'GLU']
jchodera commented 6 years ago

@peastman : Here's the updated CHARMM36 parameter file using the new patch compatibility logic (only 4.4MB now): charmm36.xml.zip

jchodera commented 6 years ago

Here's the current patch compatibility list. Currently, many patches have no compatible residues. It's likely still useful, but maybe @ChayaSt can help me figure out whether some of these patches should work and why they might be failing: patch_compatibility.txt

The best thing to do at this point is probably to try to use this. We'll also need to validate the energies are correct between OpenMM and CHARMM.

peastman commented 6 years ago

You were busy this weekend! I'll give the latest file a try and see how it works.

jchodera commented 6 years ago

Thanks! Let me know how it goes.

I still have to add more tests to parmed to make sure the energies match. The testing framework isn't as well-developed as I had hoped, so this may take a few more days before we're confident the conversion is 100% complete and correct.

We at least test to make sure the parameter files are read by simtk.openmm.app.ForceField without throwing exceptions, but the next step is to make sure ForceField can parameterize a variety of systems this way without throwing exceptions. If there are any systems I can steal from the OpenMM test suite for this purpose, please let me know!

peastman commented 6 years ago

Loading your XML file, it fails with the error

Exception: mismatched tag: line 105096, column 2

That's the very last line of the file. Here are the last two lines:

  <Patch name="DMPR">
</ForceField>

That <Patch> tag isn't terminated.

jchodera commented 6 years ago

Weird! Let me add some tests to catch this behavior, then check what is going on.

jchodera commented 6 years ago

I'm honestly still not sure what happened we in this particular case, but I've now added a test to at least make sure ForceField can read the file without issues.

The problem I'm running into now is that the set of Charmm parameters that @ChayaSt selected seems to surprisingly have a patch that has the same name as what was earlier defined as a residue. Since I believe the program charmm only allows patches to be applied by name, I think this means we either need to (1) eliminate one of the files with the conflict, or (2) have later definitions override earlier ones even if one was a residue and the other a patch.

peastman commented 6 years ago

Is that an error in the input files? Or is it just that residues and patches have different namespaces? Is there anything wrong with a residue and a patch having the same name?

jchodera commented 6 years ago

Is that an error in the input files? Or is it just that residues and patches have different namespaces? Is there anything wrong with a residue and a patch having the same name?

We're not sure yet, but we're seeking clarification.

peastman commented 6 years ago

Either way, I don't think OpenMM will care. It has different namespaces for residues and patches.

jchodera commented 6 years ago

OK, I've taken this approach for now while we seek clarification from Alex MacKerell.

Here's the updated forcefield generated using the latest code. charmm36.xml.zip

ChayaSt commented 6 years ago

According to Alex MacKerell:

Residues and patches with the same name are typically just an oversight and not related to each other. And there shouldn't be many. One area of overlap will be small molecules that are in both the CHARMM36 files and in CGenFF.

In addition, while looking through the forcefield I found some duplicate patches with different names. Here's an example.