westlake-repl / SaProt

Saprot: Protein Language Model with Structural Alphabet (AA+3Di)
MIT License
347 stars 33 forks source link

Evaluate multiple mutation effect #46

Closed dc2211 closed 4 months ago

dc2211 commented 4 months ago

Hello,

Im trying to see if is possible to get the effect of multiple mutations into a single protein structure. Im defining seq = "MdEvVpQpLrVyQdYaKv" and mut_info = "E1R:Q4D", but Im getting the error

Traceback (most recent call last): File "/home/SaProt/model/saprot/saprot_foldseek_mutation_model.py", line 292, in predict_mut tokens[pos - 1] = "#" + tokens[pos - 1][-1] IndexError: list index out of range

Any support is greatly appreciated. Thanks.

LTEnjoy commented 4 months ago

Hello,

Could you provide more information about your script, such as how you loaded SaProt? You could provide a complete script to us to reproduce the error.

dc2211 commented 4 months ago

Sure. This is the one I am running.

from utils.esm_loader import load_esm_saprot
from utils.foldseek_util import get_struc_seq
from model.saprot.saprot_foldseek_mutation_model import SaprotFoldseekMutationModel

pdb_path = "example/test.pdb"

parsed_seqs = get_struc_seq("bin/foldseek", pdb_path, ["A"], plddt_mask=False)["A"]
seq, foldseek_seq, combined_seq = parsed_seqs

config = {
    "foldseek_path": None,
    "config_path": 'westlake-repl/SaProt_650M_AF2',
    "load_pretrained": True,
}

model = SaprotFoldseekMutationModel(**config)
tokenizer = model.tokenizer
device = "cuda"
model.eval()
model.to(device)

# Predict the effect of multiple mutations
mut_info = "E2P:V3D"
mut_value = model.predict_mut(seq, mut_info)
print(mut_value)
dc2211 commented 4 months ago

lets better define seq = "MdEvVpQpLrVyQdYaKv"

dc2211 commented 4 months ago

ok, not sure what I did wrong, but now is working. Sorry any inconvenience!

LTEnjoy commented 4 months ago

No problem. Feel free to reach out if you have any question!

BTW, we recommend you use our SaprotHub to make zero-shot mutational effect prediction. It doesn't require deep understanding of the implementation of SaProt and you can make predictions by clicking the running button. see https://colab.research.google.com/github/westlake-repl/SaprotHub/blob/main/colab/SaprotHub.ipynb.