openmm / pdbfixer

PDBFixer fixes problems in PDB files
Other
464 stars 114 forks source link

PDBFixer API cannot fix mmCIF file #275

Closed locitran closed 1 year ago

locitran commented 1 year ago

Hi all, I am trying to use PDBFixer to fix mmCIF file, but it turns problem when reading it.

from pdbfixer import PDBFixer
from openmm.app import PDBFile

def pdbfixer(in_path, out_path):
    with open(in_path) as in_f:
        fixer = PDBFixer(pdbfile=in_f)
        fixer.findMissingResidues()
        chains = list(fixer.topology.chains())
        keys = fixer.missingResidues.keys()
        for key in keys:
            chain = chains[key[0]]
            if key[1] == 0 or key[1] == len(list(chain.residues())):
                del fixer.missingResidues[key]
        fixer.findNonstandardResidues()
        fixer.replaceNonstandardResidues()
        fixer.removeHeterogens(keepWater=False)
        fixer.findMissingAtoms()
        fixer.addMissingAtoms()
        with open(out_path, 'w') as out_f:
            PDBFile.writeFile(fixer.topology, fixer.positions, out_f, keepIds=True)

in_file = './4p42-assembly1.cif'
out_file = 'fix4p42.pdb'
pdbfixer(in_file, out_file)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 28
     26 in_file = '[./4p42-assembly1.cif](https://vscode-remote+ssh-002dremote-002b140-002e114-002e97-002e194.vscode-resource.vscode-cdn.net/mnt/Tsunami_HHD/newloci/NativeEnsembleWeb_copy/Rhapsody_project/scripts/4p42-assembly1.cif)'
     27 out_file = 'fix4p42.pdb'
---> 28 pdbfixer(in_file, out_file)

Cell In[3], line 9, in pdbfixer(in_path, out_path)
      7 def pdbfixer(in_path, out_path):
      8     with open(in_path) as in_f:
----> 9         fixer = PDBFixer(pdbfile=in_f)
     10         fixer.findMissingResidues()
     11         chains = list(fixer.topology.chains())

File [/mnt/Tsunami_HHD/newloci/anaconda3/lib/python3.10/site-packages/pdbfixer/pdbfixer.py:251](https://vscode-remote+ssh-002dremote-002b140-002e114-002e97-002e194.vscode-resource.vscode-cdn.net/mnt/Tsunami_HHD/newloci/anaconda3/lib/python3.10/site-packages/pdbfixer/pdbfixer.py:251), in PDBFixer.__init__(self, filename, pdbfile, pdbxfile, url, pdbid)
    248     file.close()
    249 elif pdbfile:
    250     # A file-like object has been specified.
--> 251     self._initializeFromPDB(pdbfile)
    252 elif pdbxfile:
    253     # A file-like object has been specified.
    254     self._initializeFromPDBx(pdbxfile)

File [/mnt/Tsunami_HHD/newloci/anaconda3/lib/python3.10/site-packages/pdbfixer/pdbfixer.py:284](https://vscode-remote+ssh-002dremote-002b140-002e114-002e97-002e194.vscode-resource.vscode-cdn.net/mnt/Tsunami_HHD/newloci/anaconda3/lib/python3.10/site-packages/pdbfixer/pdbfixer.py:284), in PDBFixer._initializeFromPDB(self, file)
    281 def _initializeFromPDB(self, file):
...
    743     self.residue_name_with_spaces += possible_fourth_character
    744 self.residue_name = self.residue_name_with_spaces.strip()

ValueError: Misaligned residue name: ATOM   1    N N   . ASP A 1 3   ? -52.691  -92.622  29.836  1.00 58.49  ? ?
peastman commented 1 year ago

fixer = PDBFixer(pdbfile=in_f)

That needs to be pdbxfile=in_f. You're telling it to parse the PDBx/mmCIF file as a PDB file.

locitran commented 1 year ago

Thank you, Peter. It's working now :-)

May I post another problem when modeling the N/C-terminus by PDBFixer? image

As you can see there is a very long tail at N/C-terminus. I see your codes have a short energy minimization, it's supposed to be ok with addMissingResidues inside structures. However, it's obviously to say that the result of fixing terminal residues or long continuous missing residues may not be reasonable. What do you think?

Best regards,

peastman commented 1 year ago

It's common for proteins to have flexible tails. Because they don't have a fixed rigid conformation, they can't be resolved with crystallography and they're missing from crystal structures. PDBFixer is adding them stretched outward just because it's convenient, but don't take that literally. The whole point is that they're flexible and don't have a fixed conformation. As soon as you start simulating they'll begin moving around.

Sometimes people omit the tails from their simulations. You'll need to rely on your own biological knowledge to determine whether the tails are functionally important for your protein, or if they can be safely omitted.

locitran commented 1 year ago

Thank you Peter, I got the your idea

peastman commented 1 year ago

Ok, great. I'm closing this issue, since the question has been answered.