seekrcentral / seekr2

Simulation-Enabled Estimation of Kinetic Rates - Version 2
MIT License
27 stars 6 forks source link

assumptions related to milestoning size/shape interactions in MD region #19

Closed vaibhavadixit closed 2 years ago

vaibhavadixit commented 2 years ago

Dear seekr2 team, I'm reading the assumptions of milestoning theory here. This paper, as expected, is a bit math/stat oriented, thus I might have misunderstood something and thus I'm requesting you to comment.

On page 4 in "A. Isocommittor surfaces as optimal milestones" section, it is mentioned that one of the main assumptions is the following. sets of optimal milestones satisfying property (P1') and, hence, (P1), exist. Where (P1') the probability to reach first Sj from Si is the same from any starting point on Si.

My question is related to the size, the shape of the active site and associated different interactions that a ligand may encounter during its transit in/out of the binding site.

Now, considering the different size/shape of the active sites of different protein how is it possible that the above statement remains correct? I would expect that (except for small ligands like O2, NO etc) many points starting on Si would have different types of interactions and different Es between ligand and protein and thus (P1') would be different for these points. This would invalidate the assumption above.

If these arguments are correct then what care one must take while setting up the optimum milestones during seekr2 calculations for proteins with different sizes, shapes and interactions across the active site region (the MD region)? For e.g. some proteins have shallow active sites, while others have mostly buried active sites with all kinds of shapes (V, L, Y etc) and others in between these two extermes.

Looking forward to your useful comments and suggestions. Thank you and best regards. Vaibhav

lvotapka commented 2 years ago

I will be able to comment on this next week.

lvotapka commented 2 years ago

An optimal choice of milestones, which should give the most accurate kinetics and thermodynamics quantities from a milestoning model, indeed requires that P1' is the same from any starting point on Si. Therefore, a model that hopes to optimize the milestoning model would have to define its milestones very carefully.

However, one must carefully define the state (milestone) that is defined as the "bound" state. There is not, necessarily, an obvious answer to this question. Thankfully, the kinetics/thermodynamics are often relatively insensitive to this definition, so we have some discretion when choosing it. Typically, in SEEKR2, we define the bound state as a minimum center-of-mass (COM) distance between the ligand molecule and the COM of a number of binding site residues. However one defines the "bound" state, it must be defined first before optimal milestones may be defined.

Once the bound state is defined, then, optimally, the adjacent milestone(s) would be defined such that it has an equal probability of reaching the bound state no matter where the system starts on its surface. In practice, however, we have usually chosen to approximate the surface with a slightly larger COM-COM distance than the bound state, continuing outward with increasing radii. This forms a series of concentric spheres centered on the binding site, between which the COM of the ligand can traverse. For many systems (host-guest, trypsin-benzamidine, etc.), this definition of milestones has worked well enough.

However, this does not always work, and we are actively researching and implementing alternative milestone definitions to the concentric spheres. Hopefully, we will be able to find better milestone definitions, either for systems in general, or for specific binding site/ligand pairs.

vaibhavadixit commented 2 years ago

Hi Lane, I'm still reading that paper, thus not sure if a general solution would be possible with clever math/stat tricks and assumptions. Nonetheless, I think a binding site-specific definition that takes into account the shape and electrostatics might be useful. Also, I'm thinking about the following when active sites are not shallow but deeply buried with L, Y, V shapes, are the concentric spheres really required? Will concentric arcs (surface slices) only, for the solvent-accessible regions be sufficient? This is assuming that there are no large/major structural changes and active site shape/volume remains constant within a predefined limit. Would this assumption be correct and thus lower the MD sampling requirement? I hope these questions/suggestions are meaningful to you and help progress the discussion forward. thank you and your explanation and useful comments. best regards

vaibhavadixit commented 2 years ago

this means that there can be a b-surface very close to the MD boundary. In these regions, the ligand is taken away from the active site and due to shape and interaction constraints has a low probability of coming back into the binding site despite a comparatively smaller physical distance. I can draw this if my description is not clear enough.

lvotapka commented 2 years ago

At this time, I'm convinced that large structural changes, of either the ligand or the binding site, including ligand reorientation, are responsible for most of the inaccuracy we see with the simple concentric spherical approach. It's true that electrostatics and binding site shape will affect the isocommittor surfaces (optimal milestones) along the approach to or from the bound state, but I believe that relatively little advantage can be gained by focusing on these. Therefore, we are mainly focusing on ways to account for large-scale motions in the milestoning model to save on MD sampling.

While the sampling solvent-accessible regions would be important for obtaining the k-on, for quantities such as the k-off, the bound state, where the ligand is far from the solvent, is likely to be the most important to sample adequately. However, we are still trying to determine if the entry of one or more water molecules causing the partial solvation of the ligand within the bound state is an important precursor to unbinding.

Indeed, L, Y, or V shaped sites are a challenge for the simple concentric spherical approach. We are starting to explore other types of milestones, including "planar" milestones which are very similar to what was used by Bucci et. al. 2016 (https://pubs.acs.org/doi/full/10.1021/acs.jctc.6b00071; especially see figure 5).

I'm afraid that I don't understand your statement about the b-surface. BD simulations require the b-surface to be far enough away that the forces between the two molecules are "centrosymmetric" or relatively constant regardless of molecular rotation. Therefore, for any normal protein receptor, the b-surface will need to remain large, and far from the binding site. Feel free to send me a picture to clarify.

vaibhavadixit commented 2 years ago

Hi, I did find some time to draw the figure (attached). In short, I'm proposing a new-b_surface for proteins with active-site entry points mostly located on one side of the surface (one or more entry channels but mostly located on one side). (unlike the di-copper protein discussed in your PLOS Comp Biol paper here) As seen in the figure, this new-b_surface would be smaller and thus require lower sampling for the BD region. The exact curvature of this new-b_surface (near the protein surface) would depend on the interactions between the ligand and protein_surface (small drawing near the paragraph in the figure).

The situation may be different for proteins which have complementary charges/interactions throughout their surface to drag the ligand into the active site from anywhere within X angstrom radius. (Not sure how many such proteins exist).

Do let me know what you think about this idea. I hope you find this suggestion useful and worth considering for single or one-sided entry point proteins. Thank you and best regards. Vaibhav VAD-idea-new-b-surface

lvotapka commented 2 years ago

Unfortunately, one requirement for the b-surface in BD simulations is that forces are essentially constant all over its surface - since the Smoluchowski equation must be solved to find the flux to the surface, and is conveniently solved for systems with radial symmetry. This is describe in the BD literature as "centrosymmetry" since the receptor molecule is located at the center of the spherical b-surface and the force on the ligand (located at the surface) would be independent of receptor or ligand rotations.

I'm believe you are proposing this idea to save time on BD. Is that correct? Typically the BD is relatively cheap compared to the MD in SEEKR, so most gains in performance and accuracy can be obtained by speeding the MD portion of the calculations.

lvotapka commented 2 years ago

Closing this issue for now...