openmm / NNPOps

High-performance operations for neural network potentials
Other
83 stars 18 forks source link

Other models to support? #20

Open peastman opened 4 years ago

peastman commented 4 years ago

What other models do we want to support? ANI is done and SchNet is nearing completion. What are the next top priorities?

raimis commented 4 years ago

@peastman (cc: @giadefa)

For us a priority would be the accelerated training of SchNet, not only its inference. In case of ANI, the features can be pre-computed, but it is not a case for SchNet. And, at the moment, we are still spending more time training SchNet models than running them.

Based on https://github.com/peastman/NNPOps/pull/18, the missing feature are:

peastman commented 4 years ago

I'll think about it and see if I can come up with something, but on first glance I don't see a lot of scope for speeding that up compared to what SchNetPack does. Consider the QM9 model I've been benchmarking. It uses 50 basis functions and a layer width of 128. So to backpropagate the gradients, I need to track derivatives with respect to 50+128 = 178 values. That's small enough that I can keep everything in shared memory, which is really fast. But for gradients with respect to parameters, we would have to track 128*50 + 128 + 128*128 + 128 = 23,040 derivatives. That's way too large for shared memory, so it has to be done in global memory. That's much slower, and PyTorch is already really well optimized for that case.

jchodera commented 3 years ago

Other exciting (but expensive) models are:

jchodera commented 3 years ago

But for gradients with respect to parameters, we would have to track 12850 + 128 + 128128 + 128 = 23,040 derivatives. That's way too large for shared memory, so it has to be done in global memory. That's much slower, and PyTorch is already really well optimized for that case.

It seems like a discussion with @proteneer---in terms of whether you want/need JVPs or VJPs for parameter gradients---would be valuable here. Often, it's better to recompute on the fly and implicitly form Jacobian-vector (JVP) or vector-Jacobian (VJP) products---most ML frameworks seem to support this.

peastman commented 3 years ago

The code from the Tensor Field Networks paper is at https://github.com/tensorfieldnetworks/tensorfieldnetworks. That repository points to https://github.com/e3nn/e3nn as an actively maintained implementation. It's mostly written with PyTorch, but with some CUDA code to speed up the spherical harmonics.

There's an implementation of Clebsch-Gordan Nets at https://github.com/zlin7/CGNet. It also uses CUDA wrapped with PyTorch.

Have you tried these implementations, or any others? How well do they work?

giadefa commented 3 years ago

I have used e3nn. This is the optimized one I had mentioned.

On Fri, Dec 4, 2020 at 1:43 AM peastman notifications@github.com wrote:

The code from the Tensor Field Networks paper is at https://github.com/tensorfieldnetworks/tensorfieldnetworks. That repository points to https://github.com/e3nn/e3nn as an actively maintained implementation. It's mostly written with PyTorch, but with some CUDA code to speed up the spherical harmonics.

There's an implementation of Clebsch-Gordan Nets at https://github.com/zlin7/CGNet. It also uses CUDA wrapped with PyTorch.

Have you tried these implementations, or any others? How well do they work?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/openmm/NNPOps/issues/20#issuecomment-738475570, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3KUOWO3YETC24EYOOTQHTSTAWBRANCNFSM4TZKVKFQ .

peastman commented 3 years ago

Does it seem reasonably well optimized? If there's already a good implementation, we don't need to write another one.

risi-kondor commented 2 years ago

Our new library for SO(3) equivariant neural nets, https://github.com/risi-kondor/GElib , has much more general CUDA kernels for CG-products than the above. The installation method has just been changed to pip install, I think that is not reflected in the docs yet. The C++ documentation is slightly out of date. Any feedback would be very welcome.

raimis commented 2 years ago

@risi-kondor thanks for bringing GELib to out attention. Do you have a development roadmap for the library? Would you be interested in a collaboration with NNPOps developers?

davkovacs commented 2 years ago

We are currently working on interfacing our new MACE model with OpenMM. This is an equivariant neural network but ca 10 times faster than the previous ones (even in its PyTorch version), so we hope it can be useful for molecular simulations.

https://arxiv.org/abs/2206.07697

The core operations of our model can be compiled using torchscript. Now the last major challenge remaining is the generation of the neighbourlist to form the graph. This we do using python ASE, which cannot be compiled. I see that in the SchNet folder there is an optimised neighbour list calculator, and I would like to use that, is there an example of integrating that with PyTorch SchNet code that I can look at for advice?

peastman commented 2 years ago

A neighbor list kernel was just merged a few days ago: https://github.com/openmm/NNPOps/pull/58. Will it work for your needs?

Is the MACE code available yet? I'm really looking forward to trying it out.

davkovacs commented 2 years ago

Thank you @peastman this looks almost what we need. https://github.com/openmm/NNPOps/blob/master/src/pytorch/neighbors/getNeighborPairs.py

The only thing is that we need the pair indices and the interatomic vectors, rather than just the distances. Would it be possible to create a version of that function which instead of the distances returns the distance vectors?

The MACE code will be released in a day or two time, I will message you when it is made public.

peastman commented 2 years ago

It looks like that should be an easy change. forward() already records the deltas in a tensor. It just doesn't return it. backward() would need slightly more changes so it could accumulate gradients, but that should also be easy.

davkovacs commented 2 years ago

Is this something you are planning to implement soon? It would be great for us, but also for all equivariant GNN-s. Also tagging @raimis ?

peastman commented 2 years ago

Adding it seems like a good plan to me. Do you agree @raimis?

raimis commented 2 years ago

Yes, this should be easy to implement.

davkovacs commented 2 years ago

We have released the MACE code: https://github.com/ACEsuit/mace If we could get the neighbourlist kernel to return the displacement vectors I would create the torch force object so that we can try it in OpenMM !

raimis commented 2 years ago

@davkovacs I'm working on this (https://github.com/openmm/NNPOps/pull/61).

davkovacs commented 2 years ago

@davkovacs I'm working on this (#61).

Thank you, I will follow closely, and am looking forward to trying it / testing it.

peastman commented 2 years ago

We have released the MACE code:

Thanks! Unfortunately, I don't think we'll be able to use it for anything given the license you chose. Having a license that is both non-open source and viral makes it incompatible with most open source projects.

jchodera commented 2 years ago

@davkovacs: Is there any chance you would consider distributing MACE under the OSI approved MIT License? We've tried to closely follow the Reproducible Research Standard, which aims to use licenses that explicitly make it possible for others to use, modify, build on, and redistribute our work so as to maximize its impact in the biomolecular modeling community. As @peastman points out, non-permissive licenses are difficult for us to interface with and will inherently limit the utility and impact of codes that adopt them.

giadefa commented 2 years ago

That's nice. How much faster than TorchMD-NET ET?

On Thu, Jun 23, 2022 at 5:29 PM davkovacs @.***> wrote:

We are currently working on interfacing our new MACE model with OpenMM. This is an equivariant neural network but ca 10 times faster than the previous ones (even in its PyTorch version), so we hope it can be useful for molecular simulations.

https://arxiv.org/abs/2206.07697

The core operations of our model can be compiled using torchscript. Now the last major challenge remaining is the generation of the neighbourlist to form the graph. This we do using python ASE, which cannot be compiled. I see that in the SchNet folder there is an optimised neighbour list calculator, and I would like to use that, is there an example of integrating that with PyTorch SchNet code that I can look at for advice?

— Reply to this email directly, view it on GitHub https://github.com/openmm/NNPOps/issues/20#issuecomment-1164558532, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3KUOWMA34SAQIRRMTV6L3VQR7HLANCNFSM4TZKVKFQ . You are receiving this because you were mentioned.Message ID: @.***>

davkovacs commented 2 years ago

@jchodera I don't think the license we choose is limiting in any way, it is completely free and open for any academic use. But we might change it if many people think it is a problem.

@giadefa Hard to put exact numbers, I know from experience how much faster it is to train, than some other models like BOTNet or NequIP. We should try to look at timings when we run the model from OpenMM for MD. Also we have a new JAX implementation (still experimental, not yet public) which is another 4 times faster currently. Though, some of the optimisations we will be able to port to the pytorch code too.

jchodera commented 2 years ago

@jchodera I don't think the license we choose is limiting in any way, it is completely free and open for any academic use. But we might change it if many people think it is a problem.

OpenMM is distributed under the OSI approved MIT License, a license that fulfills the Reproducible Research Standard, meaning it can be used by anyone, not just academics. If the MACE license doesn't permit a large swath of current OpenMM users to actually use it, it's certainly significantly limiting. The difficulty in executing software licenses with industry---which often requires industry to expend more dollars in executive time and legal counsel than the license revenue generates---often creates so much friction that it's generally much easier to build consortia of industry that simply want to fund fully open source software, such as the Open Force Field Consortium and Open Free Energy Consortium, under the Open Molecular Software Foundation, which recommends the use of permissive licenses for our field (rather than restrictive licenses).

peastman commented 2 years ago

Your license is both closed (it doesn't meet the Open Source definition, and therefore is incompatible with most open source licenses) and viral (it requires the same license to be applied to any code it is combined with). That means it cannot legally be combined with many open source codes. For example, OpenMM includes code that is distributed under the LGPL license. If you use your model inside OpenMM, then your license requires all of OpenMM, including the LGPL parts, to be placed under the same license. But LGPL explicitly forbids you from placing extra restrictions such as "no commercial use" on the code. So by doing that, you are violating the license.

davkovacs commented 2 years ago

@peastman @jchodera @giadefa I am happy to tell you that we have changed the license of the MACE repo to MIT license. The primary reason for that was to facilitate working together on integrating the model to the OpenMM ecosystem.

I hope we can work together in the future both in the integration and also in performance optimisation of MACE. https://github.com/ACEsuit/mace

giadefa commented 2 years ago

Great news. This will facilitate the uptake of the model.

g

On Wed, Nov 9, 2022 at 6:40 PM davkovacs @.***> wrote:

@peastman https://github.com/peastman @jchodera https://github.com/jchodera @giadefa https://github.com/giadefa I am happy to tell you that we have changed the license of the MACE repo to MIT license. The primary reason for that was to facilitate working together on integrating the model to the OpenMM ecosystem.

I hope we can work together in the future both in the integration and also in performance optimisation of MACE. https://github.com/ACEsuit/mace

— Reply to this email directly, view it on GitHub https://github.com/openmm/NNPOps/issues/20#issuecomment-1309110082, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3KUOSRENSC3UEPJMSBHBLWHPOYZANCNFSM4TZKVKFQ . You are receiving this because you were mentioned.Message ID: @.***>

peastman commented 2 years ago

That's fantastic news!

@giadefa what do you think about supporting MACE as an option in TorchMD-Net? It would be really interesting to try combining it with some of the physics based terms we've been adding.

giadefa commented 2 years ago

it would be great. Even for comparisons.

g

On Thu, Nov 10, 2022 at 7:11 PM Peter Eastman @.***> wrote:

That's fantastic news!

@giadefa https://github.com/giadefa what do you think about supporting MACE as an option in TorchMD-Net? It would be really interesting to try combining it with some of the physics based terms we've been adding.

— Reply to this email directly, view it on GitHub https://github.com/openmm/NNPOps/issues/20#issuecomment-1310693842, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB3KUOQWX77MBXMF3ETDHDDWHU3EZANCNFSM4TZKVKFQ . You are receiving this because you were mentioned.Message ID: @.***>

davkovacs commented 2 years ago

Let me know if you need any help or if you have any questions. Also when using MACE I highly highly recommend looking at how we train MACE in detail in the repo. There are quite a few small things in the optimisation which make a big difference.