Open kamurani opened 5 months ago
Hi @kamuari,
I cobbled together the code below to add plddt to the bfactor column. This works for roughly 2/3 of my PDBs, but for the rest there is a mismatch between the number of residues and the plddt vector length, that I cannot explain. I've seen up to 10 residues difference and in all cases it's fewer residues in the pdb than the npy vector.
@phbradley would you have a hint as to how we can fix this? Thanks!
import os
from Bio.PDB import PDBParser, PDBIO
import numpy as np
def get_num_residues(structure, model_id=0):
num_res = 0
for model in structure:
if model.id == model_id:
for chain in model:
for residue in chain:
num_res += 1
return num_res
def plddt_to_pdb(pdb_in, plddt_npy, pdb_out):
assert not os.path.exists(pdb_out), pdb_out
parser = PDBParser()
structure = parser.get_structure("this_id", pdb_in)
num_residues = get_num_residues(structure)
plddt = np.load(plddt_npy)
assert num_residues == len(plddt), (
f"PDB has {num_residues} residues, but plddt has {len(plddt)}")
for model in structure:
res_idx = 0
for chain in model:
for residue in chain:
for atom in residue:
atom.set_bfactor(plddt[res_idx])
res_idx += 1
assert model.id == 0
io = PDBIO()
io.set_structure(structure)
io.save(pdb_out)
It would be convenient if the output structure files from a modelling run had the supplied pLDDT values (in
.npy
form) contained within the PDB files. This would be consistent with how AlphaFold PDB files are expected (e.g. when downloading from the AF database).Happy to work on this and submit a PR.