sokrypton / ColabDesign

Making Protein Design accessible to all via Google Colab!
549 stars 127 forks source link

Fix sequence at certain positions in Binder protocol #107

Open aminsagar opened 1 year ago

aminsagar commented 1 year ago

Hello. Thanks for this amazing work. I am trying to redesign a peptide binder while keeping the sequence at some positions. For example, I would like to keep the prolines at positions 2,9 and 16 as present in the input peptide. I tried the following.

pep_model = mk_afdesign(protocol="binder")
pep_model.prep_inputs(pdb_filename="./data/Complex.pdb", chain="A", binder_chain="B", hotspot = "37,38,67,68,69", fix_pos="2,9,16", fix_seq=True) 

However, the designed peptides don't retain prolines at these positions. Am I doing something wrong here? I would be really grateful for any suggestions. Thanks. Amin.

shanilpanara commented 1 year ago

I believe that fix_pos is only currently supported in "fixbb" and "partial" modes (as per the README.md)

amin-sagar commented 1 year ago

I see. I can try to implement it if @sokrypton or other developers can give me some pointers. Maybe I need to change something here?? https://github.com/sokrypton/ColabDesign/blob/13f3e72a4a25c76942a6ff6526cb7ee0b1cd702c/colabdesign/af/design.py#L373

to disallow mutations for some positions. I would be really grateful for any suggestions.

sokrypton commented 1 year ago

The easiest way would be to modify the input bias.

af_model._inputs["bias"] is a (length,20) matrix. By default it is all zeros. If you set some positions and amino acids to large positive values they will be fixed to those amino acids.

Tell me if it doesn't work. I can take a closer look.

On Wed, Jan 11, 2023, 10:37 AM amin-sagar @.***> wrote:

I see. I can try to implement it if @sokrypton https://github.com/sokrypton or other developers can give me some pointers. Maybe I need to change something here??

https://github.com/sokrypton/ColabDesign/blob/13f3e72a4a25c76942a6ff6526cb7ee0b1cd702c/colabdesign/af/design.py#L373

to disallow mutations for some positions. I would be really grateful for any suggestions.

— Reply to this email directly, view it on GitHub https://github.com/sokrypton/ColabDesign/issues/107#issuecomment-1378969945, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA76LAV3GQ7O6WJMLV4GXTTWR3HR5ANCNFSM6AAAAAASSYMR6U . You are receiving this because you were mentioned.Message ID: @.***>

amin-sagar commented 1 year ago

Thanks @sokrypton . I tried the following script but the generated peptides don't have prolines at the specified positions.

import numpy as np
import re
from IPython.display import HTML

from colabdesign import mk_afdesign_model, clear_mem
from colabdesign.af.alphafold.common import protein, residue_constants
from tqdm import tqdm

af_model = mk_afdesign_model(protocol="binder",data_dir="/home/amin/softwares/Protein-Design")
af_model.prep_inputs(pdb_filename="../data/protein-pep.pdb", chain="A", binder_chain="B")
fixpos = [1,8,15]

print (af_model._inputs["bias"])

for i in tqdm(range (0,5)):
    print (i)
    af_model.restart()
    af_model._inputs["bias"][fixpos,aa_order["P"]] = 10000000
    print (af_model._inputs["bias"])
    af_model.design_pssm_semigreedy(120,32)
    af_model.save_pdb("Design_bind17_fix_pos_seq2_sm_"+str(i)+".pdb")

The bias matrix looks like this which seems to be correct.

[       0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.]
 [       0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.        0.
  10000000.        0.        0.        0.        0.        0.]
 [       0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.]
 [       0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.]
 [       0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.]
 [       0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.]
 [       0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.]
 [       0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.]
 [       0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.        0.
  10000000.        0.        0.        0.        0.        0.]
 [       0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.]
 [       0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.]
 [       0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.]
 [       0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.]
 [       0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.]
 [       0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.]
 [       0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.        0.
  10000000.        0.        0.        0.        0.        0.]
 [       0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.        0.
         0.        0.        0.        0.        0.        0.]]

Could you please see what I am doing wrong? Thanks, Amin.

sokrypton commented 1 year ago

Thanks for the report, the issue has been fixed in v1.1.1. But if you want to update your copy see: https://github.com/sokrypton/ColabDesign/commit/aa3ced63542d61b1b5ef2b58371b90e22f11cdde

amin-sagar commented 1 year ago

Thanks @sokrypton I updated to v1.1.1. This doesn't seem to completely solve the issue. The generated peptides still don't retain the amino acids at the biased positions. I printed out mut_seq from design_semigreedy and I see that each mutation cycle changes the amino acids. Maybe the mutate function is not considering the bias. As a test, if I mutate the fixed residues back after passing through the mutate function, it works.

for t in range(int(num_tries)):
        mut_seq = self._mutate(seq, plddt, logits=(seq_logits + self._inputs["bias"]))
        for fixaa in [1,8,15]:
            mut_seq[0,fixaa] = 4
        print (mut_seq)

I am trying to figure out what's happening but maybe it's instantly clear to you. Thanks again.

sokrypton commented 1 year ago

Should be fixed now! I tracked the bug down to predict() function, turns out when I was making a copy of the dictionary before/after prediction, the copy wasn't being made.

Please try again!

Suggested pipeline:

from colabdesign.af.alphafold.common import residue_constants
bias = np.zeros((af_model._binder_len,20))
# example: force first position to be proline
bias[0,residue_constants.restype_order["P"]] = 1e8

af_model.restart()
af_model.set_seq(bias=bias)
af_model.design_pssm_semigreedy()
amin-sagar commented 1 year ago

Thanks @sokrypton It works perfectly now. The residues are retained at the defined positions. I think I am experiencing the issue described in #85 I will post the results on that issue. Thanks again. Amin.