sokrypton / ColabFold

Making Protein folding accessible to all!
MIT License
1.97k stars 494 forks source link

Exporting to ONNX format? #512

Open elephantpanda opened 1 year ago

elephantpanda commented 1 year ago

Hi, I'm interested in exporting some of these models to ONNX format.

In particular OmegaFold, but if I can get any working that will be good.

Any tips, or anyone who has managed to export any of these to onnx format let me know thanks.

milot-mirdita commented 1 year ago

I'd be also very happy if we could run the various *Fold models with ONNX. Please let us know if you decide to invest time in this.

elephantpanda commented 1 year ago

Hi @milot-mirdita . What I'm going to do is first get AlphaFold v1 running in ONNX and c#. I'm going to be reworking this implementation: https://github.com/Urinx/alphafold_pytorch (This one is not ideal as it involves using a lookup database)

Next I would like to try and get OmegaFold working. I can get this to run in python offline, but I'm having a bit of trouble exporting the onnx files since it uses custom "frames" in torch and because I believe there is some recursion which can't be handled by onnx. So it needs a bit of work trying to export the model even to a pytorch checkpoint(s).

One reason I want to do this is to run it in the Unity game engine, which has a nice advantage of having an inbuilt 3D graphics capabilities. Do you have any thoughts if this sounds like a good or bad idea?

(disclaimer: I am contracting at Unity at the moment)

milot-mirdita commented 1 year ago

AlphaFold v1 will not be useful as it doesn't share much with AlphaFold2 and also doesn't perform well compared to state-of-the-art. You should directly go for AlphaFold2.

The current leading PyTorch implementation of AF2 is OpenFold (https://github.com/aqlaboratory/openfold).

There are still a lot of neural network layers that the AlphaFold2 team has invented and implemented that all need to be (re)implemented. I think this would be a huge undertaking.

A completely different approach that we would favor and more be interested in would be to use GGML (of llama.cpp and whisper.cpp fame) to reimplement AF2, such that it can be run without needing to rely on Python. Following this approach, it would still be easy to compile the GGML based AF2 to a single binary/library/dll/so that can easily be used anywhere (i.e. Unity).

elephantpanda commented 1 year ago

Hi, you're right AlphaFold v1 will not be very accurate. At the moment we are just trying to implement a "proof of concept". The machine learning system in Unity is currently based on ONNX format only at the moment. Once we have a proof of concept working I think it will be easier for people to implement the other more accurate models.

The idea is that if it was working in Unity it would work on essentially any device: PC, Mac, mobile, console etc.

P.S. there is an open beta if anyone wants to find more about the inference engine in Unity and see if it may be useful for the ColabFold project.

milot-mirdita commented 1 year ago

It's not about accuracy. AlphaFold1 it a completely different architecture that is also not an end-to-end deep learning model and requires much more moving parts. AF2 works reasonably if you give it only single sequence input (without even an MSA).

I don't think doing anything with AFv1 makes sense.

elephantpanda commented 1 year ago

I see. I'll definitely take another look at AF2 again. It's definitely a plus if it doesn't need MSA. And it would certainly be more valuable to the community. I believe the version I was looking at was this torch implementation: https://github.com/lucidrains/alphafold2 (I also looked at the openfold version)

The question, comes down to, if we've got enough time.

What I will say, is that if anyone wants to have a go at running any of the models in Unity, then the discussion section is very good and they will give you a lot of help regarding the inference API etc.

milot-mirdita commented 1 year ago

As I said, if you want a feature complete implementation for torch you should look at OpenFold.