openmm / pdbfixer

PDBFixer fixes problems in PDB files
Other
464 stars 114 forks source link

PDBFixer adding repeated residues at the start. #291

Closed Ian-CP closed 5 months ago

Ian-CP commented 5 months ago

Hello,

I have a large PDB file with quite a few missing residues at both ends.

I have manually created a SEQRES record for the whole protein, which starts with: SEQRES 1 N 2839 MET ALA ALA HIS ARG PRO VAL GLU TRP VAL GLN ALA VAL SEQRES 2 N 2839 VAL SER ARG PHE ASP GLU GLN LEU PRO ILE LYS THR GLY

Whereas the PDB file starts with the 4th residue, HIS.

The problem is, whenever I run PDBFixer, it strangely adds two sets of MET ALA ALA, making the final sequence MET ALA ALA MET ALA ALA HIS ARG PRO VAL...

The file with the SEQRES tag: NF1_HUMAN_with_tag.txt

The ourput file from PDB fixer: NF1_HUMAN_complete.txt

The code used to run PDBFixer from openmm.app import PDBFile from pdbfixer import PDBFixer

def fix_using_PDBFixer(pdb_file): fixer = PDBFixer(pdb_file)

fixer.findMissingResidues()
fixer.removeHeterogens(True)
fixer.findNonstandardResidues()
fixer.replaceNonstandardResidues()
fixer.findMissingAtoms()
fixer.addMissingAtoms()
print("Adding missing atoms.")

fixer.addMissingAtoms()
PDBFile.writeFile(fixer.topology, fixer.positions, open("NF1_HUMAN_complete2.txt", 'w')) 

fix_using_PDBFixer("NF1_HUMAN_with_tag.txt")

Any suggestion on how to handle this issue would be helpful.

Thanks.

Ian-CP commented 5 months ago

I realised I am just stupid, and added fixer.addMissingAtoms() twice, resulting in duplicated entries... Oh well...