openmm / pdbfixer

PDBFixer fixes problems in PDB files
Other
443 stars 112 forks source link

`fixer.findMissingResidues()` fails on 3ODU #255

Closed JSLJ23 closed 1 year ago

JSLJ23 commented 1 year ago

Hi developers of PDBfixer,

I was hoping to get some help on PDBfixer's findMissingResidues() functionality. 3ODU has missing residues at both the N and C terminal regions of the PDB strucuture but for some reaosn PDBfixer does not detect them and the fixer.missingResidues dict is empty.

pdb_id = "3odu"
pdb_path = f"./pdbs/{pdb_id}.pdb"

for chain in fixer.topology.chains():
    chain_res = []
    for residue in chain.residues():
        chain_res.append(residue.name)

    print(len(chain_res))

> 466
> 456
> 5
> 8
> 73
> 69

fixer.findMissingResidues()
fixer.missingResidues

> {}

seq = fixer.sequences.copy()
print(len(seq))

> 2

print(len(seq[0].residues))
print(len(seq[1].residues))

> 502
> 502

The first two chains of 466 and 456 amino acids should correspond to the seqres lengths of 502 and 502 and although they don't match, PDBfixer doesn't find those missing residues.

Would really appreciate any help I could get on this because I am quite clueless on how to sort this out.

peastman commented 1 year ago

There's something very strange in that file. The residue numbers abruptly jump from 229 to 900.

ATOM   1653  CB  SER A 229       0.030 -10.221  29.843  1.00 53.91           C  
ANISOU 1653  CB  SER A 229     6936   7452   6093     58   -855   -128       C  
ATOM   1654  OG  SER A 229      -0.628  -9.302  30.696  1.00 64.08           O  
ANISOU 1654  OG  SER A 229     8162   8736   7451     59   -856    -70       O  
ATOM   1655  N   GLY A 900       1.252 -10.057  27.133  1.00 45.01           N  
ANISOU 1655  N   GLY A 900     5535   5672   5897   -368    299   -198       N  
ATOM   1656  CA  GLY A 900       2.208 -10.415  26.106  1.00 44.83           C  
ANISOU 1656  CA  GLY A 900     5512   5648   5874   -367    298   -197       C  

PDBFixer interprets that as meaning there are hundreds of missing residues in the middle of the chain. Since there are no corresponding residues in the SEQRES section, it isn't able to match them up and figure out what the sequence ought to be. Any idea why the numbering is like that?

JSLJ23 commented 1 year ago

Thanks for pointing this out, I don't know why the numbering turned out like this but it makes sense that this won't be able to be matched against the SEQRES section.