martinpacesa / BindCraft

User friendly and accurate binder design pipeline
MIT License
220 stars 44 forks source link

How long does this usually take on an A100? #25

Closed utimcraig closed 2 weeks ago

utimcraig commented 2 weeks ago

To find binders for a well folded, 25kDa soluble domain enzyme? How long should I expect this to run? Is there some point where it automatically stops?

martinpacesa commented 2 weeks ago

It stop when:

  1. the number of specified final designs passes filters
  2. the number of rejected designs is higher than the acceptence rate specified in the settings (5%).

The run time depends on your target size, the bigger the longer it will run so it is difficult to say. However, you can monitor the progress of passing designs in the final_designs_stats.csv and the passing designs are located in the Accepted folder.

LennartNickel commented 2 weeks ago

One quick note on that: Every target and cluster/workstation is different, which is why its hard to predict how long it will take to get enough passing designs! For some targets, you will get a good amount of passing designs in some hours (for us e.g. PDL1, EGFR). However, this is a rare case. Usually sampling overnight on a few GPUs (on our systems) is sufficient to get enough designs to make a selection for experimental validation. However, there are cases where you have to sample a lot more to get a few designs passing filters, and its not given that you find hits for every protein at all!

nfloquet commented 2 weeks ago

On local machines (local GPUs), to avoid memory problems, does it make sense to limit the structure of the target to the binder's expected pocket ?

LennartNickel commented 2 weeks ago

Generally yes, but be careful not to open the hydrophobic core too much as the binders will then be preferentially placed there. Unfortunately, for some proteins (especially some transporters, ion channels, GPCRs, etc.) that do not have a well-defined/separated extracellular domain or where the binding site is well integrated into the protein, it can be difficult to find a good starting point. In this case, it is worth doing some shorter test runs with different settings (especially different cropping, different hotspots) to find a working setup. It is not necessary to wait for accepted designs during these test runs. Simply look at the sampled trajectories and check whether the desired binding mode is achieved.

utimcraig commented 2 weeks ago

I was doing a test on the A100 collab environment with a small soluble protein (25kDa) and found 2 "successful" binders after about 8 hours as a data point.