sokrypton / ColabDesign

Making Protein Design accessible to all via Google Colab!
549 stars 127 forks source link

pLDDT increases and then decreases during the soft iterations #85

Open victorconan opened 1 year ago

victorconan commented 1 year ago

Hi, thanks for open sourcing this work! I am exploring the binder design (using the peptide_binder_design.ipynb) with pdb id 7BW1 with binder lengths between 6-8. What I have observed is that when setting the soft iteration as 300, the pLDDT increases to above 0.8 after 100-ish iterations, and then it decreases to below 0.4 at the end of the iterations. With such low pLDDT, the subsequent hard iteration will end with pLDDT around 0.4. I tried to understand why this happened and how to improve it. I have tried different number of iterations for soft and hard, and also number of tries. But nothing seems to help. Do you have any suggestions? Thanks!

sokrypton commented 1 year ago

The most important metric is i_ptm (as this quantifies the confidence of the interface).

plddt, I wonder if this is low because the final peptide is not forming a full helix, but something that resembles a helix. Which may still be a valid solution!

image

I'm getting plddt of 0.6 if I increase number of recycles of 1 (maybe we need to go higher!)

victorconan commented 1 year ago

The most important metric is i_ptm (as this quantifies the confidence of the interface).

plddt, I wonder if this is low because the final peptide is not forming a full helix, but something that resembles a helix. Which may still be a valid solution! image

I'm getting plddt of 0.6 if I increase number of recycles of 1 (maybe we need to go higher!)

Got it! Yes, I am getting peptides that resemble helix. Let me try higher number of recycles! Thanks!

victorconan commented 1 year ago

I'm following up with my trials. I noticed that for some target proteins, the i_ptm increases and then decreases during the soft iterations. For example, 5HHX_A with a binder length of 8. I tried increasing the number of recycles to 9, the problem still exists. It also seems plddt and i_ptm follow the same pattern: increases first and then decreases, and ends up with low values. Wonder if there is any other parameters to tune for such a problem? Thanks!!!

amin-sagar commented 1 year ago

@victorconan @sokrypton Have you found a solution to this. I am also experiencing the same issue. Is there supposed to be a maximum number of soft iterations. My i_ptm, pLDDT and i_con progress like this during the soft iterations.

i_ptm_iterations pLDDT_iterations i_con_iterations

i_ptm and pLDDT reach quite a high value and then they both drop while i_con has the opposite trend.

I would be really grateful for any suggestions. Best, Amin.

sokrypton commented 1 year ago

Once you the design protocol transitions into "hard", sometimes things become unstable. Either because there does not exist a single-sequence solution... or because the optimization is too sensitive (multiple mutations and everything falls apart).

Can you try the design_pssm_semigreedy() protocol? The protocol essentially does the first step of the 3stage protocol, but then switches to sampling mutations (from the soft sequence of the previous step) and accepting those that improve loss in a "semi-greedy" way. Semi-greedy because x random mutations are tries and the one that decreases loss the most is accepted.

amin-sagar commented 1 year ago

@sokrypton Sorry, I should have mentioned. I am using design_pssm_semigreedy() protocol. The script is from #107

from colabdesign.af.alphafold.common import residue_constants
bias = np.zeros((af_model._binder_len,17))
#  force some positions to be proline
fixpos = [1,8,15]
bias[fixpos,residue_constants.restype_order["P"]] = 1e8

af_model.restart()
af_model.set_seq(bias=bias)
af_model.design_pssm_semigreedy()

I have only plotted the values I get in stage 1 i.e. the soft iterations. So, the drop doesn't happen while transitioning to "hard" phase but during the soft stage itself.

Amin.

victorconan commented 1 year ago

Once you the design protocol transitions into "hard", sometimes things become unstable. Either because there does not exist a single-sequence solution... or because the optimization is too sensitive (multiple mutations and everything falls apart).

Can you try the design_pssm_semigreedy() protocol? The protocol essentially does the first step of the 3stage protocol, but then switches to sampling mutations (from the soft sequence of the previous step) and accepting those that improve loss in a "semi-greedy" way. Semi-greedy because x random mutations are tries and the one that decreases loss the most is accepted.

Is there any way we could tell whether there exists a single sequence solution? For some targets (for instance, 7BW1), we know certain length of peptides won't fit into the pocket. But that doesn't mean for a given length, there exists a peptide binder. By tweaking the number of recycles, I could get some binders with plddt>0.8 and i_ptm>0.9. If for a given length, after trying different number of recycles, we couldn't get satisfying binders, can we say there probably does not exist a binder for that length?

amin-sagar commented 1 year ago

@sokrypton @victorconan This seems like a nice explanation in some cases. In the particular case that I am working on, there are multiple peptides of the same length known to bind the target. I was actually trying to see if I can arrive at similar sequences using this method. So, I know multiple solutions exist but are probably too hard to find.