openmm / pdbfixer

PDBFixer fixes problems in PDB files
Other
497 stars 117 forks source link

CASN adds hydrogenation errors #306

Closed astomer2 closed 2 months ago

astomer2 commented 2 months ago

pdbfixer adds four hydrogens to the N of the CASN, which is undoubtedly wrong, but the program does not report an error. when I input this structure into openmm to minimize it, the program shows the missing atom again. I'm not sure what's wrong with pdbfixer, but I'd like to add the ability to remove raw hydrogen to pdbfixer so that at least it gives me the correct backbone atoms

image

ATOM 9520 H ALA B 1 -35.221 2.011 16.461 1.00 0.00 H
ATOM 9521 H3 ALA B 1 -36.510 0.835 16.608 1.00 0.00 H
ATOM 9522 C ALA B 1 -34.696 -0.642 18.161 1.00 0.00 C
ATOM 9523 O ALA B 1 -34.316 -1.354 17.225 1.00 0.00 O
ATOM 9524 CA ALA B 1 -34.909 0.848 17.980 1.00 0.00 C
ATOM 9525 HA ALA B 1 -33.767 1.184 18.024 1.00 0.00 H
ATOM 9526 CB ALA B 1 -35.823 1.393 19.079 1.00 0.00 C
ATOM 9527 HB1 ALA B 1 -36.024 2.504 18.697 1.00 0.00 H
ATOM 9528 HB2 ALA B 1 -35.291 1.703 20.113 1.00 0.00 H
ATOM 9529 HB3 ALA B 1 -36.878 0.910 19.359 1.00 0.00 H
ATOM 9530 N ALA B 1 -35.490 1.029 16.665 1.00 0.00 N
ATOM 9531 H2 ALA B 1 -35.026 0.406 15.973 1.00 0.00 H
ATOM 9532 CA MET B 2 -34.572 -2.467 19.725 1.00 0.00 C
ATOM 9533 HA MET B 2 -34.133 -3.107 18.827 1.00 0.00 H
ATOM 9534 C MET B 2 -35.843 -3.256 19.980 1.00 0.00 C
ATOM 9535 O MET B 2 -35.826 -4.487 19.968 1.00 0.00 O
ATOM 9536 N MET B 2 -34.913 -1.098 19.379 1.00 0.00 N
ATOM 9537 CB MET B 2 -33.684 -2.495 20.956 1.00 0.00 C
ATOM 9538 HB2 MET B 2 -32.788 -1.953 20.393 1.00 0.00 H
ATOM 9539 HB3 MET B 2 -33.151 -3.488 21.327 1.00 0.00 H
ATOM 9540 CG MET B 2 -34.444 -2.220 22.220 1.00 0.00 C
ATOM 9541 HG2 MET B 2 -34.191 -2.874 23.184 1.00 0.00 H
ATOM 9542 HG3 MET B 2 -35.469 -1.659 22.463 1.00 0.00 H
ATOM 9543 SD MET B 2 -33.712 -0.854 23.098 1.00 0.00 S
ATOM 9544 CE MET B 2 -32.463 -0.384 21.917 1.00 0.00 C
ATOM 9545 HE1 MET B 2 -32.047 0.105 22.922 1.00 0.00 H
ATOM 9546 HE2 MET B 2 -31.454 -0.403 21.283 1.00 0.00 H
ATOM 9547 HE3 MET B 2 -33.130 0.510 21.498 1.00 0.00 H
ATOM 9548 H MET B 2 -35.281 -0.510 20.123 1.00 0.00 H
ATOM 9549 N ASN B 3 -36.945 -2.544 20.198 1.00 0.00 N
ATOM 9550 H ASN B 3 -36.846 -1.553 20.245 1.00 0.00 H
ATOM 9551 CA ASN B 3 -38.218 -3.177 20.487 1.00 0.00 C
ATOM 9552 HA ASN B 3 -38.172 -4.337 20.723 1.00 0.00 H
ATOM 9553 CB ASN B 3 -38.948 -2.426 21.584 1.00 0.00 C
ATOM 9554 HB2 ASN B 3 -39.947 -1.924 21.161 1.00 0.00 H
ATOM 9555 HB3 ASN B 3 -38.375 -1.492 22.054 1.00 0.00 H
ATOM 9556 CG ASN B 3 -39.514 -3.360 22.615 1.00 0.00 C
ATOM 9557 OD1 ASN B 3 -39.699 -4.544 22.345 1.00 0.00 O
ATOM 9558 ND2 ASN B 3 -39.777 -2.847 23.803 1.00 0.00 N
ATOM 9559 HD21 ASN B 3 -38.663 -2.402 24.058 1.00 0.00 H
ATOM 9560 HD22 ASN B 3 -40.907 -3.239 23.678 1.00 0.00 H
ATOM 9561 2HD ASN B 3 -40.196 -1.970 24.031 1.00 0.00 H
ATOM 9562 1HD ASN B 3 -39.767 -3.432 24.612 1.00 0.00 H
ATOM 9563 C ASN B 3 -39.070 -3.267 19.232 1.00 0.00 C
ATOM 9564 OXT ASN B 3 -39.730 -2.295 18.869 1.00 0.00 O
ATOM 9565 O ASN B 3 -38.958 -4.431 18.701 1.00 0.00 O
TER 9566 ASN B 3

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], [line 1](vscode-notebook-cell:?execution_count=4&line=1)
----> [1](vscode-notebook-cell:?execution_count=4&line=1) batch_fix('/mnt/nas2/lanwei/ACE/dataset/structure/','/mnt/nas2/lanwei/ACE/dataset/structure/')

File /mnt/nas1/lanwei-125/peptide-deploy/utils_peptide/fix_pdb_clash.py:133, in batch_fix(input_pdb_dir, output_pdb_dir)
    [131](https://vscode-remote+ssh-002dremote-002b125.vscode-resource.vscode-cdn.net/mnt/nas1/lanwei-125/peptide-deploy/utils_peptide/fix_pdb_clash.py:131) input_pdb = str(pdb)
    [132](https://vscode-remote+ssh-002dremote-002b125.vscode-resource.vscode-cdn.net/mnt/nas1/lanwei-125/peptide-deploy/utils_peptide/fix_pdb_clash.py:132) output_pdb = str(output_pdb_dir / pdb.name)
--> [133](https://vscode-remote+ssh-002dremote-002b125.vscode-resource.vscode-cdn.net/mnt/nas1/lanwei-125/peptide-deploy/utils_peptide/fix_pdb_clash.py:133) fix_pdb_clash(input_pdb, output_pdb)

File /mnt/nas1/lanwei-125/peptide-deploy/utils_peptide/fix_pdb_clash.py:118, in fix_pdb_clash(input_pdb, output_pdb)
    [116](https://vscode-remote+ssh-002dremote-002b125.vscode-resource.vscode-cdn.net/mnt/nas1/lanwei-125/peptide-deploy/utils_peptide/fix_pdb_clash.py:116) del_clash_side_chain(input_pdb, output_pdb)
    [117](https://vscode-remote+ssh-002dremote-002b125.vscode-resource.vscode-cdn.net/mnt/nas1/lanwei-125/peptide-deploy/utils_peptide/fix_pdb_clash.py:117) fixed_pdb_file(output_pdb)
--> [118](https://vscode-remote+ssh-002dremote-002b125.vscode-resource.vscode-cdn.net/mnt/nas1/lanwei-125/peptide-deploy/utils_peptide/fix_pdb_clash.py:118) openmm_minimize(output_pdb)
    [119](https://vscode-remote+ssh-002dremote-002b125.vscode-resource.vscode-cdn.net/mnt/nas1/lanwei-125/peptide-deploy/utils_peptide/fix_pdb_clash.py:119) os.remove(output_pdb)
    [120](https://vscode-remote+ssh-002dremote-002b125.vscode-resource.vscode-cdn.net/mnt/nas1/lanwei-125/peptide-deploy/utils_peptide/fix_pdb_clash.py:120) os.rename(f"{(Path(output_pdb)).with_suffix('')}_minimized.pdb", output_pdb)

File /mnt/nas1/lanwei-125/peptide-deploy/utils_peptide/properties_from_structure/caculate_potential_energy.py:120, in openmm_minimize(pdb_file)
    [118](https://vscode-remote+ssh-002dremote-002b125.vscode-resource.vscode-cdn.net/mnt/nas1/lanwei-125/peptide-deploy/utils_peptide/properties_from_structure/caculate_potential_energy.py:118) pdb = PDBFile(str(pdb_file))
    [119](https://vscode-remote+ssh-002dremote-002b125.vscode-resource.vscode-cdn.net/mnt/nas1/lanwei-125/peptide-deploy/utils_peptide/properties_from_structure/caculate_potential_energy.py:119) forcefield = ForceField("amber14-all.xml", "implicit/gbn2.xml")
--> [120](https://vscode-remote+ssh-002dremote-002b125.vscode-resource.vscode-cdn.net/mnt/nas1/lanwei-125/peptide-deploy/utils_peptide/properties_from_structure/caculate_potential_energy.py:120) system = forcefield.createSystem(
    [121](https://vscode-remote+ssh-002dremote-002b125.vscode-resource.vscode-cdn.net/mnt/nas1/lanwei-125/peptide-deploy/utils_peptide/properties_from_structure/caculate_potential_energy.py:121)     pdb.topology,
    [122](https://vscode-remote+ssh-002dremote-002b125.vscode-resource.vscode-cdn.net/mnt/nas1/lanwei-125/peptide-deploy/utils_peptide/properties_from_structure/caculate_potential_energy.py:122)     nonbondedMethod=NoCutoff,
    [123](https://vscode-remote+ssh-002dremote-002b125.vscode-resource.vscode-cdn.net/mnt/nas1/lanwei-125/peptide-deploy/utils_peptide/properties_from_structure/caculate_potential_energy.py:123)     nonbondedCutoff=10 * angstrom,
    [124](https://vscode-remote+ssh-002dremote-002b125.vscode-resource.vscode-cdn.net/mnt/nas1/lanwei-125/peptide-deploy/utils_peptide/properties_from_structure/caculate_potential_energy.py:124)     constraints=HBonds,
...
-> [1433](https://vscode-remote+ssh-002dremote-002b125.vscode-resource.vscode-cdn.net/mnt/nas1/lanwei-125/env/amber/lib/python3.9/site-packages/openmm/app/forcefield.py:1433)     raise ValueError('No template found for residue %d (%s).  %s  For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#template' % (res.index+1, res.name, _findMatchErrors(self, res)))
   [1434](https://vscode-remote+ssh-002dremote-002b125.vscode-resource.vscode-cdn.net/mnt/nas1/lanwei-125/env/amber/lib/python3.9/site-packages/openmm/app/forcefield.py:1434) else:
   [1435](https://vscode-remote+ssh-002dremote-002b125.vscode-resource.vscode-cdn.net/mnt/nas1/lanwei-125/env/amber/lib/python3.9/site-packages/openmm/app/forcefield.py:1435)     if recordParameters:

ValueError: No template found for residue 598 (ASN).  The set of atoms is similar to CGLN, but it is missing 1 atoms.  For more information, see https://github.com/openmm/openmm/wiki/Frequently-Asked-Questions#template
peastman commented 2 months ago

Can you provide instructions on how to reproduce it? Are you using the GUI, the Python API, or the command line interface? Describe exactly what you're doing, provide your code if you're using the Python API, and include any input files needed to reproduce it.

astomer2 commented 2 months ago

Here is my code and the original file, the polar hydrogen is preserved on the E chain in the original file, which corresponds to the B chain of the modified structure


def fixed_pdb_file(input_pdb_file: Union[str,Path], output_pdb_file= None)  -> Path:
    """
    Generate a fixed PDB file from the given PDB file.

    Parameters:
        pdb_file (Path or str): The path to the input PDB file.
        output_file (Path or str, optional): The path to the output PDB file. If not provided, the output file will have the same name as the input file.

    Returns:
        Path: The path to the output PDB file.

    """
    input_pdb_file = Path(input_pdb_file)
    fixer = PDBFixer(filename=str(input_pdb_file))

    fixer.findMissingResidues()
    fixer.findNonstandardResidues()

    fixer.replaceNonstandardResidues()
    fixer.removeHeterogens(True)

    fixer.findMissingAtoms()
    fixer.addMissingAtoms()
    fixer.addMissingHydrogens(7.0)    

    if output_pdb_file is not None:
        PDBFile.writeFile(fixer.topology, fixer.positions, open(output_pdb_file, 'w'))
    else:
        output_pdb_file=input_pdb_file
        PDBFile.writeFile(fixer.topology, fixer.positions, open(output_pdb_file, 'w'))
    return output_pdb_file

AMN.zip

peastman commented 2 months ago

The problem is that your PDB file is misaligned. The ASN residue contains the lines

ATOM     25  2HD2ASN E   3     -40.196  -1.970  24.031  1.00  0.00           H
ATOM     26  1HD2ASN E   3     -39.767  -3.432  24.612  1.00  0.00           H

Those atoms are clearly intended to be named 2HD2 and 1HD2. But the atom name goes in columns 13-16, and those names are in columns 14-17. The atom names get read as just 2HD and 1HD, which aren't names OpenMM recognizes or knows what to do with. So it just leaves those atoms alone and adds two hydrogens to ND2, since (as far as it knows) there aren't currently any hydrogens on it.

1HD2 and 2HD2 are themselves nonstandard names. The correct names for those atoms are HD21 and HD22. But the other names are common enough that it knows to look for them and convert them to the correct names.

Also note this paragraph from the documentation:

A special case is when the model already contains a hydrogen that should not be present in the desired variant. If you explicitly specify a variant using the ‘variants’ parameter, the residue will be modified to match the desired variant, removing hydrogens if necessary. On the other hand, for residues whose variant is selected automatically, this function will only add hydrogens. It will never remove ones that are already present in the model, regardless of the specified pH.

astomer2 commented 2 months ago

Thank you for your inspection and help. I used the code of biopython to delete all hydrogen atoms before pdbfixer, which successfully solved my problem. However, I still hope to add the function of deleting all hydrogens in pdbfixer, so I wrote the following function to add it to pdbfixer to delete hydrogen atoms

    def removeHydrogens(self):
        """Remove all hydrogens from the structure."""
        modeller = app.Modeller(self.topology, self.positions)
        hydrogens = [atom for atom in modeller.topology.atoms() if  'H' in atom.name ]
        if hydrogens:  
            modeller.delete(hydrogens)  
        self.topology = modeller.topology
        self.positions = modeller.positions
peastman commented 2 months ago

This is a better way to do the check:

hydrogens = [atom for atom in modeller.topology.atoms() if atom.element == element.hydrogen]