nrbennet / dl_binder_design

MIT License
203 stars 49 forks source link

Is "AF2 initial guess" supposed to have access to the standard AF2 databases? #57

Closed rszabla closed 9 months ago

rszabla commented 9 months ago

Hello,

My installation seems to be running as expected and it produced some beautifully-folded peptide binders with low pae_interaction scores for my project. My question is how is the "AF2 initial guess" pipeline able to generate a predicted structure without access to all the same databases that are normally required to run AF2? i.e.:

    bfd/                                   # ~ 1.8 TB
    mgnify/                                # ~ 64 GB
    params/                                # ~ 3.5 GB
    pdb70/                                 # ~ 56 GB
    pdb_mmcif/                             # ~ 206 GB
    uniclust30/                            # ~ 87 GB
    uniref90/                              # ~ 59 GB

Did I miss an important part of the installation? Is the predict.py script somehow running the full AF2 installation on my machine? Or am I just ignorant to how your implementation of AF2 works?
I just want to make sure that my dl_binder_design installation is configured properly before I put too much trust into the pae_interaction scores and spend money on peptide orders.

Thank you,

Robert Szabla

johnny-rodriguez commented 9 months ago

I have the same question.

nrbennet commented 9 months ago

The AF2 initial guess protocol only requires the AF2 weights, which you have already downloaded as part of the installation described in the repo. The other databases included with AF2 are used for MSA and template generation, both of which are critical for predicting native proteins. Since we are predicting idealized de novo proteins we don't use MSAs or templates.

So to answer your question: if you have designs with low pae_interaction then they are predicted to work by AF2 initial guess!

rszabla commented 9 months ago

Thank you, that is very helpful!