Open jeherr opened 3 years ago
Which NNP architecture are you interested?
The ML support in OpenMM is in progress (https://github.com/orgs/openmm/projects/1), but it still not easy to use. If you willing to become an early adopter and start trying things, it will be beneficial for OpenMM.
Well I believe the Chodera group used ANI-2x in the ML/MM paper which only included intramolecular interactions, but this shouldn't be specific to any particular NNP architecture. So long as it predicts energies as a sum of contributions from atoms, any NNP model should be swapable here.
@raimis: @jeherr is a new postdoc in my group working on ML/MM free energy calculations. The openmm-ml package aims to make it simple to convert an MM system to ML/MM, where only the intramolecular interactions for the ML region is handled by ML.
We would like to explore the obvious, simple space of ways we can easily extend this to also include ML for the interactions between ML and MM regions.
@peastman : Is it OK to discuss this here, or should we move this discussion to https://github.com/openmm/openmm-ml?
We shouldn't do anything that's specific to ANI. That's what was used in the one paper, but it's an outdated architecture. Other types of models are far more accurate. In the longer term, those are what we should plan to focus on.
In modern architectures, there's no such thing as an AEV. They're convolutional. You start by producing an embedding vector for each atom. A series of layers exchanges information between nearby atoms. At the end, you produce an energy for each atom.
The closest equivalent to what you described would be to only produce energies for a limited number of atoms, and therefore only do the earlier calculations needed to produce them. But that still ends up being a lot of work. Suppose you have five layers each with a cutoff of 10 A. For the very last layer, you only need to produce output for your small number of atoms. But for the next to last layer, you need to produce output for every atom within 10 A of any of them. For the third to last layer, any atom within 20 A of any of them. And so on.
@peastman : Is it OK to discuss this here, or should we move this discussion to https://github.com/openmm/openmm-ml?
Where do you suppose "here" is? :)
We shouldn't do anything that's specific to ANI. That's what was used in the one paper, but it's an outdated architecture. Other types of models are far more accurate. In the longer term, those are what we should plan to focus on.
Do you have a good comparison of the speed of ANI vs the higher accuracy models? Something where we're really saturating the GPU to compare. I suspect ANI is more than a factor of 2 faster, but less than a factor of 10. There's a lot of computational efficiency in there that we might need which those other models cannot provide. We already lose about a factor of 5 or 10 when using ML just for the ligand compared with pure MM.
The closest equivalent to what you described would be to only produce energies for a limited number of atoms, and therefore only do the earlier calculations needed to produce them. But that still ends up being a lot of work. Suppose you have five layers each with a cutoff of 10 A. For the very last layer, you only need to produce output for your small number of atoms. But for the next to last layer, you need to produce output for every atom within 10 A of any of them. For the third to last layer, any atom within 20 A of any of them. And so on.
Good lord why would we ever need a sensory range of 50 A?
Do you have a good comparison of the speed of ANI vs the higher accuracy models?
That's a hard comparison to do, for a couple of reasons. First, because any model can be made faster or slower by changing its size, so how do you compare two very different models? Do you compare the speed of two models with equal numbers of parameters? Or that produce similar accuracy? Do they need to cover the same range of chemical space? The cost of ANI scales as O(n^2) in the number of atom types, while convolutional models are usually independent of the number of types. That's why ANI is limited to a very small number of elements, and will probably never grow to cover many more.
Second, we've put a lot of work into optimizing ANI, but not yet into any other models. You can't really compare the speed of models until both of them are well optimized. I've just started profiling the equivariant transformer model. The GPU is only busy for a tiny fraction of the time, so there should be a lot of room for speedups.
Good lord why would we ever need a sensory range of 50 A?
Those numbers were just for illustration. As for what range is actually needed, that will depend on the details of the model. For a model that has explicit Coulomb and dispersion terms, it will be shorter than for one that needs to learn those interactions from data. Generally speaking, the required range is roughly the distance over which electrons can rearrange within the molecule, plus the distance over which interactions between those electrons are significant.
This is all of interest from the perspective of trying to build and train NNP models, and while I would be happy to discuss this another time, from the perspective of developing the ML/MM methodology we're not interested in which model would be the fastest in theory, we're interested in what model is the fastest that we can use right now.
With that said
The closest equivalent to what you described would be to only produce energies for a limited number of atoms, and therefore only do the earlier calculations needed to produce them. But that still ends up being a lot of work. Suppose you have five layers each with a cutoff of 10 A. For the very last layer, you only need to produce output for your small number of atoms. But for the next to last layer, you need to produce output for every atom within 10 A of any of them. For the third to last layer, any atom within 20 A of any of them. And so on.
we've put a lot of work into optimizing ANI, but not yet into any other models.
That's what was used in the one paper
These are all excellent reasons why we should stick with ANI for now.
My point is that we shouldn't build a lot of infrastructure that's specific to ANI, because it will all be obsolete very soon. Anything we do should be generic so it will work with many models.
I wouldn't be so sure that ANI will be obsolete soon. You're going to have a hard time getting those other models to beat it in terms of speed I think. There are going to be cases where users will want to sacrifice some accuracy for speed because they want a longer simulation or a larger system. I think it will stick around longer than you believe.
My point is that we shouldn't build a lot of infrastructure that's specific to ANI, because it will all be obsolete very soon. Anything we do should be generic so it will work with many models.
It's perfectly sensible to consider efficient schemes that are specific to particular classes of architectures for machine learning potentials. The ANI generation of models is not going away anytime soon:
Yes, there will be room for exploration of how we can enable other architectures to efficiently include ML-quality interactions between ML and MM regions, but the idea that the entire class of models that use AEVs is completely outdated at this point is not only patently incorrect, but it is not even self-consistent with how we have directed our effort.
the idea that the entire class of models that use AEVs is completely outdated at this point is not only patently incorrect, but it is not even self-consistent with how we have directed our effort.
I consider it my duty to make it completely outdated as quickly as I possibly can. I'm hard at work on it! :)
More seriously, the only thing keeping that architecture alive is the lack of pretrained models for better architectures (which again I'm working to address). If you look at the architectures published in the last several years, hardly any of them use AEVs.
You're going to have a hard time getting those other models to beat it in terms of speed I think.
As soon as you try to handle more than seven elements, charged groups, polar molecules, interactions over longer distances than about 5 A, etc. you're going to have a hard time getting ANI to match the speed of those other models.
I consider it my duty to make it completely outdated as quickly as I possibly can. I'm hard at work on it! :)
The longer-term effort to develop a better standard deployable ML potential should not and need not come at the cost of not allowing our users to fully exploit the capabilities we have spent months implementing to support well-tested and widely used ML potentials.
More seriously, the only thing keeping that architecture alive is the lack of pretrained models for better architectures (which again I'm working to address). If you look at the architectures published in the last several years, hardly any of them use AEVs.
While that may be true, (1) AEV-based models are in widespread use now (and we have taken steps to increase this widespread use via OpenMM), (2) they permit simple, computationally efficient routes to inclusion of ML interactions between the ML and environment regions, and (3) as you have clearly stated, current modern message-passing architectures (or continuous spatial convolutions) are deficient because they do not support any kind of efficient route to inclusion of ML interactions between ML and environment regions.
What we're looking for is an intermediate step to bridge us between the current "ML can be only used within a small region" and "ML will be sufficiently performant for the entire system". These simple modifications offer a simple way of extending what we have done to this intermediate region.
We are not wedded to these particular proposals, but it is important we think about what may be possible incrementally to bridge these two extremes since it looks like making 100K-atom ML simulations performant will be years away.
As soon as you try to handle more than seven elements, charged groups, polar molecules, interactions over longer distances than about 5 A, etc. you're going to have a hard time getting ANI to match the speed of those other models.
To be perfectly clear, we are only suggesting doing this with existing ANI architectures (like ANI2x) to enable users to use and experiment with variants on those model parameters. We are not proposing this is a long-term path to anything but a bridge between now (ML for tiny region only) and years-from-now (full-ML for everything at same speed as MM).
If this is still unclear, I'm happy to explain in more detail on the next ML call.
I think that we all agree that we will support AEV and graph methods at least. Even if just for comparison reasons. None of the two is particularly useful at the moment due to current available models and limitations, e.g. no charged molecules, etc.
On Sat, Nov 6, 2021 at 8:57 PM John Chodera @.***> wrote:
I consider it my duty to make it completely outdated as quickly as I possibly can. I'm hard at work on it! :)
The longer-term effort to develop a better standard deployable ML potential should not and need not come at the cost of not allowing our users to fully exploit the capabilities we have spent months implementing to support well-tested and widely used ML potentials.
More seriously, the only thing keeping that architecture alive is the lack of pretrained models for better architectures (which again I'm working to address). If you look at the architectures published in the last several years, hardly any of them use AEVs.
While that may be true, (1) AEV-based models are in widespread use now (and we have taken steps to increase this widespread use via OpenMM), (2) they permit simple, computationally efficient routes to inclusion of ML interactions between the ML and environment regions, and (3) as you have clearly stated, current modern message-passing architectures (or continuous spatial convolutions) are deficient because they do not support any kind of efficient route to inclusion of ML interactions between ML and environment regions.
What we're looking for is an intermediate step to bridge us between the current "ML can be only used within a small region" and "ML will be sufficiently performant for the entire system". These simple modifications offer a simple way of extending what we have done to this intermediate region.
We are not wedded to these particular proposals, but it is important we think about what may be possible incrementally to bridge these two extremes since it looks like making 100K-atom ML simulations performant will be years away.
As soon as you try to handle more than seven elements, charged groups, polar molecules, interactions over longer distances than about 5 A, etc. you're going to have a hard time getting ANI to match the speed of those other models.
To be perfectly clear, we are only suggesting doing this with existing ANI architectures (like ANI2x) to enable users to use and experiment with variants on those model parameters. We are not proposing this is a long-term path to anything but a bridge between now (ML for tiny region only) and years-from-now (full-ML for everything at same speed as MM).
If this is still unclear, I'm happy to explain in more detail on the next ML call.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openmm/openmm-ml/issues/16#issuecomment-962502255, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3KUOQOTNOCKIYDYKNDPPTUKWJDVANCNFSM5HJLU4TA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
Let me put it this way. My educated belief is that ANI has very little future. Within a few years, it will be considered obsolete. My advice to you (which of course you are completely free to ignore!) is that if you want to have a major impact on the field, don't waste your time developing methods that are specific to ANI and can't be adapted to the newer architectures the field is moving to. If you do, your methods will quickly become obsolete.
My educated belief is that ANI has very little future. Within a few years, it will be considered obsolete.
@peastman: I am in agreement with both of these points.
My advice to you (which of course you are completely free to ignore!) is that if you want to have a major impact on the field, don't waste your time developing methods that are specific to ANI and can't be adapted to the newer architectures the field is moving to. If you do, your methods will quickly become obsolete.
Our difference in viewpoints arises from the fact that we're thinking about what the key methodological advance is differently. The key advance we are aiming for is not extending AEV methods to enable ML/MM interactions to be treated at the ML/MM level. It's demonstrating what you can do with a model that allows you to (1) treat ML/MM interactions at the ML level, and (2) retrain them to improve performance on tasks. The model itself is not important, provided (1) there is a standard model to start with, and (2) it is fast---not much slower than MM.
What we need here is just an ML potential function U(x_ML, x_MM; \theta)
which treats tunable ML/MM interactions at the ML level with tunable parameters \theta
, and is not much slower than our current ML/MM (where only ML intramolecular interactions are treated with ML). With this, we can show (1) that we can improve accuracy (e.g. in alchemical free energy calculations) by treating interactions between regions at the ML level, and (2) we can retrain the model to improve this on related molecules. Establishing this is the important impact, not the details of how we introduce the ML interactions between MM and ML regions--they will be mostly irrelevant and model-dependent. Technical details.
The ANI suggestions in this thread are all things that (1) can use a current standardized model like ANI2x, (2) take advantage of the acceleration work you've already done, (3) would degrade performance by at most 2x, and (4) would be easy to implement given what we have. I don't mind if we dump them as soon as you have a better architecture and standardized ML model---honest! It's the higher-level approach that's more important to address since we have opportunity to do so now rather than waiting for a potentially general way to handle ML/MM interactions at the ML model (which I'm not convinced will ever be architecture-independent).
In modern architectures, there's no such thing as an AEV. They're convolutional. You start by producing an embedding vector for each atom. A series of layers exchanges information between nearby atoms. At the end, you produce an energy for each atom.
The graph networks don't feed the atomic position directly into NNs. TorchMD-Net is still using RBF to get a fixed-length vectors (a.k.a. AEV) from distances, while ANI is using the Behler-Parrinello functions to do the same. So, the high-level there is no much difference, all of them do the same:
For practical reason, @jeherr should start with ANI-2x to prove his ideas, and start generalising when TorchMD-Net is optimised and pre-trained models are available.
Feed through NNs (do it several times if necessary)
See my comments above: this is a critical difference. A method like ANI only gathers information from neighbors once, while a convolutional model does it repeatedly, with each layer greatly extending the number of atoms that can influence each atom.
Our difference in viewpoints arises from the fact that we're thinking about what the key methodological advance is differently. The key advance we are aiming for is not extending AEV methods to enable ML/MM interactions to be treated at the ML/MM level. It's demonstrating what you can do with a model that allows you to (1) treat ML/MM interactions at the ML level, and (2) retrain them to improve performance on tasks.
That doesn't sound to me like much of a methodological advance. To be useful, it needs to be a method other people can actually use, and can be applied to the problems they actually want to use it for (e.g. charged molecules). Otherwise it isn't an advance. It's just a detour that doesn't move the field forward.
Let's consider alternate approaches. We want to develop a method with the following properties.
One approach could be to extend the model to distinguish between ML and MM atoms. At each layer, each atom would gather information from all its neighbors just as it does currently. But it would have two different neighbor lists, one for ML atoms and one for MM atoms. The former would be represented by a feature vector that's updated at every layer. The latter would be represented by an unchanging embedding vector. At the end, each ML atom would produce an energy, and it would be trained to make those energies add up to the correct internal plus interaction energy.
A different approach is to recognize that a neural network isn't necessarily the best way of computing interaction energies. When you already know the correct physical form for the interaction, why try to make a neural network learn it? Nonbonded interactions are well described by Coulomb + dispersion + exclusion, and we have good, fast approximations for all of them. The only thing missing in the conventional force field is polarization. If we want to improve the accuracy, we need a better description of where the charge is. So we could use a neural network purely for predicting that, then use classical expressions for the interaction between protein and ligand.
We're interested in enhancing the ML/MM method by including intermolecular interactions between the ML and MM regions of the system. There is some discussion on the slack channel for this about which pitfalls we might run into by complicating the method beyond intramolecular interactions, so we should build a few different methods to find the most accurate and cost effective way we can do this without running the whole system in ML.
The easiest and cheapest method is to just include the intermolecular interactions between ML atoms with nearby MM atoms, but not the reverse for MM atoms with nearby ML atoms. The energy of the system is given by
U_{ML/MM}(x_ML, x_MM) = U_{MM}(x_MM) + U_{ML}(x_ML; x_MM) + 0.5 U_{MM}(x_ML; x_MM)
where
U_{MM}(x_ML; x_MM)
is divided in half to account for the ML interactions that have been included. This should be the cheapest method to include any ML intermolecular interactions because the only difference is including MM atoms in the AEVs of the ML atoms and dividing MM interactions in half. The downside is we are limiting ourselves to half of the accuracy boost that ML could provide.For the second method, if we assume the ML model learns to split the energy of atom pair interactions evenly between the pair, then we can get the total ML intermolecular interaction energy by building AEVs for the ML region both with and without the MM subregion considered in the local environments of each atom. If we double the difference in energy between those two we get the full interaction energy between the ML/MM region and we can ignore the corresponding MM interactions. John wrote this in a flexible form where we can mix the ML and MM intermolecular interactions by interpolating between the two with a parameter s
U_{ML/MM}(x_ML, x_MM; s) = U_{MM}(x_MM) + U_{ML}(x_ML) + (1-s) U_{MM}(x_ML; x_MM) + 2*s [ U_{ML}(x_ML; x_MM) - U_{ML}(x_ML) ]
where
s=0
is the full MM interaction case ands=1
is the full ML interaction case. This will probably be the best method in my opinion. If anything we can regularize the ML model by explicitly training these interactions to be split evenly to correct the model for this if need be.The last method is to build the AEVs for the minimal number of atoms we need to in the MM subregion to get their ML interaction energies with the ML region. This will be the most costly because we have to build AEVs for the ML atoms, and a subset of the MM atoms. The bonus is we don't need to get interactions for the ML region this time, we can just compute that once with the full local environment. For this method we have to collect the set of atoms in the MM subregion which fall into the neighbor list of all the ML atoms at least once, and then build the AEVs for those MM atoms with and without the ML atoms in their local environments. The energy for the system can be given by
U_{ML/MM}(x_ML, x_MM) = U_{MM}(x_MM) + U_{ML}(x_ML; x_MM) + [ U_{ML}(x_MM; x_ML) - U_{ML}(x_MM) ]
This last method will be the most accurate, but may not be much better than the second method while costing significantly more.