sokrypton / ColabFold

Making Protein folding accessible to all!
MIT License
1.8k stars 461 forks source link

can colabfold be used to predict loops while leaving the remaining of the protein untouched? #595

Open hima111997 opened 3 months ago

hima111997 commented 3 months ago

i have a protein structure with some loops missing amino acids. Is there a way to use colabfold to model these loops?

sokrypton commented 3 months ago

you can use our advanced notebook for this!

https://colab.research.google.com/github/sokrypton/ColabDesign/blob/gamma/af/examples/predict.ipynb

msa_method=single_sequence template_mode=custom

If the sequence of the template matches the query, and no msa is used, then the output should copy the template in regions that are defined.

hima111997 commented 3 months ago

Thank you for your help. I checked the notebook. Just to be sure, my case include a homo tetramer with the same loop missing in each copy. So should I add the sequence 4 times and separate them using : in addition to the settings you sent?  

Yahoo Mail: Search, organise, conquer

On Thu, 28 Mar 2024 at 5:58 pm, Sergey @.***> wrote:

you can use our advanced notebook for this!

https://colab.research.google.com/github/sokrypton/ColabDesign/blob/gamma/af/examples/predict.ipynb

msa_method=single_sequence template_mode=custom

If the sequence of the template matches the query, and no msa is used, then the output should copy the template in regions that are defined.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

sokrypton commented 3 months ago

In your case, just set copies=4, propagate_to_copies=False and list all chains from the pdb out. See end of notebook for instructions.

hima111997 commented 3 months ago

i have done as mentioned in the instructions, however it produced an error related to the pdb file.

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
[<ipython-input-2-23535cc0c9af>](https://localhost:8080/#) in <cell line: 171>()
    174   for pdb,chain in zip(pdbs,chains):
    175     query_seq = "".join(u_sequences)
--> 176     batch = predict.get_template_feats(pdb, chain,
    177       query_seq=query_seq,
    178       query_a3m=template_msa,

2 frames
[/content/colabdesign/af/contrib/predict.py](https://localhost:8080/#) in get_template_feats(pdbs, chains, query_seq, query_a3m, copies, propagate_to_copies, use_seq, use_dgram, get_pdb_fn, align_fn)
    113     if isinstance(chain,str): chain = chain.split(",")
    114     for c in chain:
--> 115       info = prep_pdb(pdb_filename, c, ignore_missing=True)
    116       N.append(n)
    117       X.append(info)

[/content/colabdesign/af/prep.py](https://localhost:8080/#) in prep_pdb(pdb_filename, chain, offsets, lengths, ignore_missing, offset_index, auth_chains)
    429   # go through each defined chain
    430   for n,chain in enumerate(chains):
--> 431     pdb_str = pdb_to_string(pdb_filename, chains=chain, models=[1], auth_chains=auth_chains)
    432     protein_obj = protein.from_pdb_string(pdb_str) #, chain_id=chain)
    433     batch = {'aatype': protein_obj.aatype,

[/content/colabdesign/shared/protein.py](https://localhost:8080/#) in pdb_to_string(pdb_file, chains, models, auth_chains)
    181       old_lines = pdb_file.split("\n")
    182     else:
--> 183       with open(pdb_file,"rb") as f:
    184         old_lines = [line.decode("utf-8","ignore").rstrip() for line in f]
    185     for line in old_lines:

FileNotFoundError: [Errno 2] No such file or directory: 'tmp/AF-tetramer-F1-model_v4.pdb'

I have written the name of the pdb file of the homotetramer (tetramer) in the pdb option and wrote the 4 chain as A,B,C,D.

These are the options i used:

msa_method: single_sequence
pair_mode: unpaired_paired
filtering options (left the same as default)
template_mode: custom
pdb: tetramer
chain: A,B,C,D
jkosinski commented 3 months ago

think I have seen a similar FileNotFoundError when the template file name contained upper case letters. Can you change to lowercase? Maybe also you need to rename to four characters eg 1xxx.pdb, but not sure of that.

hima111997 commented 3 months ago

the error was solved by doing this

!cp tetramer.pdb tmp/AF-tetramer-F1-model_v4.pdb

i copied and renamed my pdb file to the tmp folder.

However, now after running the cell, while setting copies = 4 it crashed due to memory. Therefore, i tried to reduce it to 2 and it ran without error, however the results modified the whole protein.

image

the green and cyan cartoon are the original protein while the magenta is the one produced from alphafold. As can be seen the relative position of the second monomer is different compared to the original structure

sokrypton commented 3 months ago

Can you share a screenshot of the template features (should appear after prep_inputs cells)?

1) if you say "tetramer" it will try download a protein by name of "tetramer" from alphafolddb. leave this blank (or provide the actual path to the pdb of interest). if it's blank, you'll get a prompt for upload. 2) make sure you set propagate_to_copies=False (otherwise it will take the first chain, and provide it as independent template for all other copies, ignoring any interchain info)

hima111997 commented 3 months ago

this is the template features and the propagate_to_copies selection was not selected: image

sokrypton commented 3 months ago

Looks like only features for chain A were loaded. Did you set: chain="A,B"?

hima111997 commented 3 months ago

yes. i ran the cells again but gave me the same plot:

image