pdbfixer is mislabeling built residues with negative res IDs

openmm / pdbfixer

PDBFixer fixes problems in PDB files

Other

461 stars 115 forks source link

pdbfixer is mislabeling built residues with negative res IDs #175

Open nitroamos opened 5 years ago

nitroamos commented 5 years ago

In this line

        newResidue = chain.topology.addResidue(residueName, chain, "%d" % ((firstIndex+i)%10000))

PDBFixer is wrapping negative residue numbers around 10000, meaning that a residue whose number is supposed to be -4 is ending up as 9996.

One fix would look like this:

        newResId = firstIndex+i
        if len(str(newResId)) >= 5:
       newResId = (firstIndex+i)%10000
        newResidue = chain.topology.addResidue(residueName, chain, "%d" % (newResId))

which is closer to what happens in OpenMM

Or even simpler would be to not do the modulo in PDBFixer since OpenMM does it.

peastman commented 5 years ago

Where did you find a PDB file with negative residue numbers? Residue numbers are supposed to be the index within the SEQRES section, which by definition can never be negative.

nitroamos commented 5 years ago

In my test case, it's coming from a REMARK 465 section which is integrated with PDBFixer as outlined here. For example, here's a random one Google found for me, take a look here

REMARK 465                                                                      
REMARK 465 MISSING RESIDUES                                                     
REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE                       
REMARK 465 EXPERIMENT. (RES=RESIDUE NAME; C=CHAIN IDENTIFIER;                   
REMARK 465 SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.)                            
REMARK 465     RES C SSSEQI                                                     
REMARK 465     MET A   -19                                                      
REMARK 465     GLY A   -18                                                      
REMARK 465     SER A   -17                                                      
REMARK 465     SER A   -16
...

I think the scientific origin of this is when people want to number their residues based on a pre-existing sequence alignment.

peastman commented 5 years ago

Ok, that makes sense. Your solution looks fine. Note that when PDBFixer calls PDBFile.writeFile(), it specifies keepIds=True. That's why it needs to do the modulo itself instead of relying on PDBFile to do it.

nitroamos commented 5 years ago

oops, didn't mean to close it. 😄

peastman commented 5 years ago

Good point. So PDBFixer really doesn't need to do this.