phbradley / TCRdock

python tools for TCR:peptide-MHC modeling and analysis
MIT License
66 stars 9 forks source link

`.pdb` files in model output should have `pLDDT` scores stored in `b_factor` column #18

Open kamurani opened 5 months ago

kamurani commented 5 months ago

It would be convenient if the output structure files from a modelling run had the supplied pLDDT values (in .npy form) contained within the PDB files. This would be consistent with how AlphaFold PDB files are expected (e.g. when downloading from the AF database).

Happy to work on this and submit a PR.

andreas-wilm commented 3 days ago

Hi @kamuari,

I cobbled together the code below to add plddt to the bfactor column. This works for roughly 2/3 of my PDBs, but for the rest there is a mismatch between the number of residues and the plddt vector length, that I cannot explain. I've seen up to 10 residues difference and in all cases it's fewer residues in the pdb than the npy vector.

@phbradley would you have a hint as to how we can fix this? Thanks!

import os

from Bio.PDB import PDBParser, PDBIO
import numpy as np

def get_num_residues(structure, model_id=0):
    num_res = 0
    for model in structure:
        if model.id == model_id:
            for chain in model:
                for residue in chain:
                    num_res += 1
    return num_res

def plddt_to_pdb(pdb_in, plddt_npy, pdb_out):
    assert not os.path.exists(pdb_out), pdb_out

    parser = PDBParser()
    structure = parser.get_structure("this_id", pdb_in)
    num_residues = get_num_residues(structure)

    plddt = np.load(plddt_npy)
    assert num_residues == len(plddt), (
            f"PDB has {num_residues} residues, but plddt has {len(plddt)}")

    for model in structure:
        res_idx = 0
        for chain in model:
            for residue in chain:
                for atom in residue:
                    atom.set_bfactor(plddt[res_idx])
                res_idx += 1
        assert model.id == 0

    io = PDBIO()
    io.set_structure(structure)
    io.save(pdb_out)