How to predict protein structure using Diffdock-pp？

juanxin commented 5 days ago

The inferred results of the Diffdock-PP model provide information solely about the positions of the C-alpha atoms in the protein backbone. Could you please provide me with detailed information on how to obtain the positional information of other atoms, as demonstrated in the file 1AVX-ligand-full.pdb?

yaledeus commented 5 days ago

Thanks for your question! Since our task setting is rigid protein-protein docking, the transformation of C-alpha atoms can be applied to other atoms straightforwardly. To be specific, for unbound positions of ligand ($x{0c}$ for C-alpha atoms and $x{0}$ for all atoms) and receptor ($x{1c}$ for C-alpha atoms and $x{1}$ for all atoms), if DiffDock-PP predicts the $\mathrm{SE}(3)$ transformation $(R,t)$ based on $x{0c}$ and $x{1c}$, then the docked positions of all atoms can ben obtained by $Rx_{0}+t$.

juanxin commented 5 days ago

Thank you for your quick reply! I would like to ask if you could provide a Python file or any other file that generates the positions of all atoms based on the given C-alpha atoms

yaledeus commented 5 days ago

You can try the following python script. The inputs are the unbound positions of all atoms with shape $(N, 3)$, the bounded positions of C-alpha atoms with shape $(N_c, 3)$, and the indices of all C-alpha atoms with shape $(N_c,)$.


import numpy as np

def kabsch_numpy(P: np.ndarray, Q: np.ndarray):
    P = P.astype(np.float64)
    Q = Q.astype(np.float64)

    PC = np.mean(P, axis=0)
    QC = np.mean(Q, axis=0)

    UP = P - PC
    UQ = Q - QC

    C = UP.T @ UQ
    V, S, W = np.linalg.svd(C)

    d = (np.linalg.det(V) * np.linalg.det(W)) < 0.0

    if d:
        V[:, -1] = -V[:, -1]

    R: np.ndarray = V @ W

    t = QC - PC @ R # (3,)

    return (UP @ R + QC).astype(np.float32), R.astype(np.float32), t.astype(np.float32)

def ca_to_all_atom_transformation(unbound_pos, bound_pos_ca, ca_index):
    """
    :param unbound_pos: unbound positions of all atoms, (N, 3)
    :param bound_pos_ca: bounded positions of CA atoms, (Nc, 3)
    :param ca_index: the indices of CA atoms, (Nc,)
    :return: bounded position of all atoms, (N, 3)
    """
    unbound_pos_ca = unbound_pos[ca_index]
    _, R, t = kabsch_numpy(unbound_pos_ca, bound_pos_ca)
    bound_pos = unbound_pos @ R + t
    return bound_pos

yaledeus / ElliDock

How to predict protein structure using Diffdock-pp？ #13