sokrypton / ColabDesign

Making Protein Design accessible to all via Google Colab!
549 stars 127 forks source link

how to interpret results of 'use_templates=True' versus 'False' in partial_hallucination_rewire? #119

Open genya opened 1 year ago

genya commented 1 year ago

I'm trying to use the partial_hallucination_rewire notebook to replace part of a protein with a linker-like sequence that otherwise preserves the overall predicted structure (code below). To preserve the exact sequence outside of the hallucination region, I think I have to have fix_seq=True in model.prep_inputs(), but I'm unsure whether use_templates in mk_afdesign_model() should be True or False?

The goal with these partly hallucinated proteins is to disrupt a function specific to the hallucinated region while preserving functions localized to other parts as well as overall stability/solubility. To assess which partly hallucinated variants are likely to have these properties I'm looking at how well the structure outside the hallucinated region is preserved and the pLDDT score both outside and within the hallucinated region.

These metrics look better when use_templates=True, but does this actually mean that the predicted structure is more reliable or is use_templates=True forcing the output to conform to the input by construction?

Maybe relevant is that the protein is so large (~1000AA) that my input PDB is the part of a larger alphafold model containing only the relevant domain (otherwise memory errors), and I'm unsure whether AlphaFold would predict this domain to fold on its own as it does in the larger structure (would it help to model this?).

Here's the code:


use_templates_setting = False
fix_seq_setting = True

model = mk_afdesign_model(protocol="partial",
                          use_templates=use_templates_setting) # set True to constrain positions using template input

#define positions we want to constrain (input PDB numbering)
input_len =142
swap_region = [842, 875]  #region of the structure to be replaced by loop

for loop_len in [4,5,6,7,8,9,10]: #not sure how long the inserted loop should be

  new_len = input_len - (swap_region[1]-swap_region[0]+1) + loop_len
  old_pos = "771-" + str(swap_region[0]-1) + "," + str(swap_region[1]+1) + "-912"

  outputfile = '_'.join(["loop" + str(loop_len),
                        'template' + str(int(use_templates_setting)),
                        'seq' + str(int(fix_seq_setting))]) + '.pdb'

  print(new_len, old_pos)
  print(outputfile)

  model.prep_inputs("myprotein.pdb", chain="A",
                    pos=old_pos,               # define positions to contrain
                    length=new_len,             # define if the desired length is different from input PDB
                    fix_seq=fix_seq_setting)   # set True to constrain the sequence

  model.rewire(loops=[loop_len])

  print(model.opt["pos"])

  model.restart()

  #balance weights [dgram_cce = restraint weight], [con = hallucination weight]
  model.set_weights(dgram_cce=1, con=0) #no idea what these numbers should be
  model.design_3stage(200,100,10) #no idea what these numbers should be

  model.save_pdb(outputfile)
sokrypton commented 1 year ago

When only a single sequence is provided, alphafold is limited to what it can predict. If you have reason to believe that the region you are constraining will fold regardless if alphafold thinks it will or not, you can constrain it by adding a template.

Examples: target proteins (for designing PPI) or domains (in cases where there is an extension at the termini or loops of large protein domain)