openmm / pdbfixer

PDBFixer fixes problems in PDB files
Other
443 stars 112 forks source link

openmm.OpenMMException: Particle coordinate is nan #289

Open VeScarecrow opened 2 months ago

VeScarecrow commented 2 months ago

When I try to fix a protein (PDB ID: 2xgc), the exception is:

Traceback (most recent call last):
  File "/home/huang/project/deepkaserver/model/deepka/deepka.py", line 56, in preprocess
    fixed_str = fix_pdb_file_by_pdbfixer(raw_data)
  File "/home/huang/project/deepkaserver/utils/pdb/preprocess_pdb.py", line 167, in fix_pdb_file_by_pdbfixer
    fixer.addMissingAtoms()
  File "/home/huang/anaconda3/envs/deepkaserver/lib/python3.7/site-packages/pdbfixer/pdbfixer.py", line 954, in addMissingAtoms
    mm.LocalEnergyMinimizer.minimize(context)
  File "/home/huang/anaconda3/envs/deepkaserver/lib/python3.7/site-packages/openmm/openmm.py", line 11150, in minimize
    return _openmm.LocalEnergyMinimizer_minimize(context, tolerance, maxIterations)
openmm.OpenMMException: Particle coordinate is nan

my running python code is:

def fix_pdb_file_by_pdbfixer(pdbstr):
    """
    Add missing residues and atoms of pdb content.
    :param pdbstr: String, pdb content.
    :return pdb_fixedstr: String, fixed pdb content.
    """
    # change 'String' type to 'File' type, use temporary file.
    pdb_inf = tempfile.TemporaryFile(mode='w+')
    pdb_inf.write(pdbstr)
    pdb_inf.seek(0)

    # add missing Residues and Atoms, remove Heterogens
    fixer = PDBFixer(pdbfile=pdb_inf)
    fixer.findMissingResidues()
    fixer.findMissingAtoms()
    chains = list(fixer.topology.chains())
    keys = list(fixer.missingResidues.keys())

    for key in keys:
        chain = chains[key[0]]
        # head and tail not count in missing residues
        if key[1] == 0 or key[1] == len(list(chain.residues())):
            del fixer.missingResidues[key]
    fixer.addMissingAtoms()
    # a = fixer.missingAtoms
    # print(a)

    # save fixed information as pdb string
    pdb_outf = tempfile.TemporaryFile(mode='w+')
    PDBFile.writeFile(fixer.topology, fixer.positions, pdb_outf, keepIds=True)
    pdb_outf.seek(0)
    pdb_fixedstr = pdb_outf.read()
    # with open('1A91_new.pdb', 'w+') as f:
    #     f.write(pdb_fixedstr)
    # close temporary files
    pdb_inf.close()
    pdb_outf.close()

    return pdb_fixedstr

I have tested some other proteins, all running well, only 2xgc have this problem, why?

peastman commented 2 months ago

Try running it a few times and see if it succeeds. Some molecule are more difficult than others, especially if you're adding long stretches of missing residues.

VeScarecrow commented 2 months ago

stretches

Thank you for your suggestions. I did notice that some proteins failed to repair on the first attempt but were successful on the second. However, the protein with PDB ID 2xgc has not been successfully repaired after more than ten attempts. Interestingly, when I try to repair 2xgc by running the code in the Terminal, the program gets stuck on the 'minimize' process and does not return any results (even after several days), but running the program in PyCharm returns the error message mentioned above. I have tried setting different random seeds, but the results remain the same. I have also set the parameters 'tolerance' and 'maxIterations' for the function 'minimize()', but neither has resolved the issue. What should I do?

peastman commented 2 months ago

That's going to be a very challenging case to fix. It's a dimer, and each copy has a stretch of 14 residues that are missing right in the middle of the chain. You may need to build those back in by hand.