wells-wood-research / timed-design

Protein Sequence Design with Deep Learning and Tooling like Monte Carlo Sampling and Analysis
46 stars 11 forks source link

Fix residues #82

Closed sunal1996 closed 4 months ago

sunal1996 commented 4 months ago

At the moment, TIMED predicts all the residues for a given protein. However, it appears that there is a demand for being able to fix some residues on a given protein, such as here:

https://github.com/wells-wood-research/timed-design/issues/78

This PR enables the user to:

a) Fix residues on specified chains, on multiple structures to the WT aminoacids. When this is done, everything except the specified residues on specified chains will be PREDICTED BY TIMED.

b) Predict residues on specified chains. When this is used, everything except the specified residues on specified chains will be KEPT THE SAME AS WT.

There is a need for improvement though. For example, we can customize the fixing of residues even further by feeding some sort of a csv file, first column being the pdb name and second column being the residue numbers. As it is, the feature assumes that we always want to fix the same residues in all proteins in a given folder. However, this is rarely the case.Still, the path for achieving this seems clear to me, and can be implemented easily.

universvm commented 4 months ago

As mentioned in private chat, we should use:

def get_residue_ids(protein_structure: ampal.Polypeptide) -> t.List[str]:
    """Returns a list of residue IDs from an AMPAL structure."""
    return [residue.id for residue in protein_structure.get_monomers()]

and then used as:

residue_ids = get_residue_ids(selected_structure)
# Check where the start and end positions are in the residue ids
start_position = residue_ids.index(start_position)

NB: This should be run on each chain but the function is for a polypeptide.