Option to allow partial residues

tristanic commented 5 years ago

When building models into electron density maps it is common practice to leave out atoms for which there is no visible density. Truncated amino acid sidechains are the most common example (arguments for and against truncating sidechains have raged for decades - personally I'm for keeping them in, since they're often invaluable for guiding what is and isn't possible). Those are pretty trivial in any case, and can be easily catered to with a small handful of artificial residue definitions.

What's much harder to deal with is the case of partially-modelled lipids. Here the argument for partial modelling is much stronger: it's very rare to see a lipid completely resolved in a density map - typically you'll see just the head group and 5-10 carbons from each tail, with the remainder flopping freely who-knows-where. It would be effectively impossible for me to convince people to model these in, and it would probably be counter-productive anyway. So it would be really nice to have the ability to allow truncated residues under certain, highly-defined conditions. The conditions I'd propose are:

The name of the residue template is provided
The modelled portion of the residue must be a single continuous fragment
Missing atoms must not be part of a ring
There must be an exact correspondence between atom names in modelled residue and template (it would be practically impossible to resolve ambiguities otherwise)
Allowing a partial residue has been specified for that specific residue - no global "allow partial residues" switch

Is this something that might see general use, or is it too niche? If the latter, I should be able to find a way to do it without touching OpenMM code (i.e. by creating truncated templates on-the-fly as necessary), but if the former I guess it would make sense to implement in ForceField.createSystem()?

jchodera commented 5 years ago

This sounds pretty niche, since we usually want the entire residue to be specified---even if there is no density there---since we know the rest of the atoms must be there in the physical system, and our simulation is meant to model the physical system. I've never heard of anyone simulating fragments of chemical system that were resolved by the X-ray density. (For example, you wouldn't simulate just the oxygens of crystal waters even though that's all you generally see!)

tristanic commented 5 years ago

Yep, I completely agree that it’s very niche - I certainly can’t think of an application outside of building models into density for deposition on the PDB. Using partial residues in any production simulation sounds like a fast track to madness. Yet for this specific application it’s more-or-less a necessity. I used to think that it made sense that the person actually building the model in the first place must be in the best position to sensibly build in unresolved atoms, but seeing the hash some people manage to make of things even in good density has firmly disabused me of that notion. At least atoms being left out is a clear flag to the end-user - if they’re built in, many people will just run with them without ever knowing which parts of the starting coordinates are real and which are fantasy.

Anyway, sounds like this is something I should implement externally. After some thinking about it I think I can see a fairly straightforward approach.

On 21 Feb 2019, at 21:08, John Chodera notifications@github.com wrote:

This sounds pretty niche, since we usually want the entire residue to be specified---even if there is no density there---since we know the rest of the atoms must be there in the physical system, and our simulation is meant to model the physical system. I've never heard of anyone simulating fragments of chemical system that were resolved by the X-ray density. (For example, you wouldn't simulate just the oxygens of crystal waters even though that's all you generally see!)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

peastman commented 5 years ago

Would it make more sense to include the atoms during modeling, then strip them out before depositing the final structure? It seems like that would combine the advantages of both approaches. You wouldn't mislead people by claiming to know atom positions you really don't know. But you'd also have confidence that your model was at least consistent with some arrangement of those atoms. It would keep you from putting atoms in places where they couldn't possibly go, because the missing sidechains would be in the way.

tristanic commented 5 years ago

In some cases, yes - in fact, I always strongly advise people to handle amino acid residues that way, particularly at low resolution. But that becomes increasingly cumbersome and impractical for really big, flexible ligands - lipids as I mentioned, NADH, PEG fragments etc. where a substantial portion is unresolved, and particularly when other parts of the model are not yet built. The floppy parts will continually get in the way, falling into parts of the map that don’t belong to them and generally making a nuisance of themselves.

Ultimately, though, I don’t really have a say in this - there’s 4+ decades of inertia saying that this is the way models are built, so I need to cater to it.

On 21 Feb 2019, at 22:28, peastman notifications@github.com wrote:

Would it make more sense to include the atoms during modeling, then strip them out before depositing the final structure? It seems like that would combine the advantages of both approaches. You wouldn't mislead people by claiming to know atom positions you really don't know. But you'd also have confidence that your model was at least consistent with some arrangement of those atoms. It would keep you from putting atoms in places where they couldn't possibly go, because the missing sidechains would be in the way.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

tristanic commented 5 years ago

Just to provide a small concrete example: the attached is a picture of a cardiolipin modelled into a recent EM map of mitochondrial complex I (6g2j), at the fringe of the transmembrane domain. As published its fit to the map isn't great (just one of hundreds of sites in this model where an MD forcefield can help), but it's good enough to say it's correctly identified. The map contour is dialled down to the absolute minimum possible before noise begins to dominate, but this is really all that's visible of the tails (which in reality are each 19 carbons long). In this detergent-solubilized complex any model of the remaining carbons is going to be quite meaningless and distracting, and most likely wouldn't even be helpful in setting up an equilibrium simulation - you'd need to rebuild the tails anyway to pack neatly with the surrounding membrane lipids. So, much as it pains me to admit it the most sensible solution is simply to find a neat way to allow simulation of fragments like this.

cdl

jchodera commented 5 years ago

I still really like @peastman's suggestion of simulating complete molecules and only fitting the resolved parts to the density, then truncating them at the deposition stage. The rest of the molecules MUST be there physically, and their presence should serve to enhance the accuracy of the resolved portions of the molecules. In fact, the structural heterogeneity of the unresolved portions should inform the lack of density in those regions, so it provides even more information about the fit.

If simulating unphysical residue fragments because they fit the density well is a real use case, why even use a physical force field at all? You could utilize something much faster that strips out the physics (such as electrostatics) in favor of fitting the density.

tristanic commented 5 years ago

why even use a physical force field at all? You could utilize something much faster that strips out the physics (such as electrostatics) in favor of fitting the density.

That's essentially exactly how existing model-building/refinement restraint libraries (Engh and Huber restraints) work - target values and standard deviations for bonds, angles and (some) torsions and a penalty term for clashes, but no electrostatics or proper van der Waals treatment. They work just fine where the resolution is good enough to clearly define atom positions, but at ~3A and below the results are typically not good (and have mostly only gotten better over time due to increased use of restraints based on reference to high-resolution homologues). That's where I've found that full classical MD forcefields really start to come into their own. If you have the time to watch it, the worked example at https://isolde.cimr.cam.ac.uk/case-studies/case-study-3io0/ might help illustrate the issue.

I still really like @peastman [1]'s suggestion of simulating complete molecules and only fitting the resolved parts to the density,

There are various issues that make the whole problem not that simple.

The model (particularly if it's a structure for which no prior homologue has been solved) is typically not built all at once. In the worst-case scenario in crystallography (data worse than 3A resolution, no starting model at all, just some initial map from phases based on anomalous signal), then your starting point is a model automatically built by one of various chain-tracing algorithms that may be only 50% complete and will have some sequence assigned based on the sidechain density, but will have various fragments of poly-Ala etc. From there the challenge is to iteratively extend and improve the model to get better maps, allowing further building, etc. Every bit that you manage to add correctly improves the map, but every bit in the wrong place either degrades it or simply gets in the way of building in the right thing.

Now, the example I posted is from a cryo-EM map, but imagine if this were in a crystallographic dataset where not all of the surrounding protein has been fully built. The density for the headgroup is clear and identifiable as cardiolipin and you want to build it in (since as per the above, this will help improve the maps), but those floppy tails hang perilously close to some density that is not yet filled, but clearly doesn't belong to them. At that point you need to make some decision about what to do about that: restrain them to some arbitrary position and/or free them from feeling the MDFF forces at all (which effectively makes them just excess computational baggage), or simply leave them out at least until the model is sufficiently complete that they can be safely reinstated. Current approaches in the field lean strongly towards the latter approach, and from various discussions I'm afraid if I don't cater to that I will lose potential users.

All that being said, I do know exactly where you're coming from. I actually wish there were some formal mechanism allowing for model builders to submit two models to the PDB for a given dataset: a "traditional" one showing only what's visible in the density, and their best stab at what's actually there. For the first structure I rebuilt using MDFF-based methods (https://www.sciencedirect.com/science/article/pii/S0969212616000071) I chose to submit the "complete" model (including the most likely N-linked glycans based on mass spec and various other unresolved-but-important features) as part of the supplementary info for the paper. A reasonable compromise, I suppose - but I have to wonder how many people have actually ever noticed it's there.

On 2019-02-22 14:50, John Chodera wrote:

I still really like @peastman [1]'s suggestion of simulating complete molecules and only fitting the resolved parts to the density, then truncating them at the deposition stage. The rest of the molecules MUST be there physically, and their presence should serve to enhance the accuracy of the resolved portions of the molecules. In fact, the structural heterogeneity of the unresolved portions should inform the lack of density in those regions, so it provides even more information about the fit.

If simulating unphysical residue fragments because they fit the density well is a real use case, why even use a physical force field at all? You could utilize something much faster that strips out the physics (such as electrostatics) in favor of fitting the density.

-- You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub [2], or mute the thread [3].

Links:

[1] https://github.com/peastman [2] https://github.com/pandegroup/openmm/issues/2265#issuecomment-466422204 [3] https://github.com/notifications/unsubscribe-auth/AVneYOSLgkE4b_CLZCJlrFole0t6iXmFks5vQAOsgaJpZM4bH4yB

tristanic commented 5 years ago

Anyway, seems pretty clear that this is something that should be implemented in external code. Closing.

jchodera commented 5 years ago

I'd just be careful about electrostatics when you use partial residues! You don't want to create net charges or split dipoles. With CHARMM force fields, this is in principle easier since they have a concept of local charge groups of bonded atoms, but AMBER does not have that concept encoded in the force field. It will be quite difficult to modulate the charges in such as way as to retain hydrogen bonds and salt bridges for portions of residues that are modeled while not totally disrupting the electrostatics of the whole system.

tristanic commented 5 years ago

Yes, that’s certainly something that will need to be handled carefully (although it’s not quite as serious a concern in ISOLDE as it would be elsewhere - no periodic boundary conditions, so any effects of poorly balanced charges should be quite local). I’ve hand-made overall-neutral templates for the most common amino acid truncations, but yes - I’ll have to think about what to do if someone decides they really want to cut at a strongly-charged atom.

On the plus side, the sites where truncation makes sense are almost by definition not in contact with all that many surrounding atoms.

On 26 Feb 2019, at 18:39, John Chodera notifications@github.com wrote:

I'd just be careful about electrostatics when you use partial residues! You don't want to create net charges or split dipoles. With CHARMM force fields, this is in principle easier since they have a concept of local charge groups of bonded atoms, but AMBER does not have that concept encoded in the force field. It will be quite difficult to modulate the charges in such as way as to retain hydrogen bonds and salt bridges for portions of residues that are modeled while not totally disrupting the electrostatics of the whole system.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub, or mute the thread.

openmm / openmm

Option to allow partial residues #2265

Links: