openmm / pdbfixer

PDBFixer fixes problems in PDB files
Other
461 stars 115 forks source link

findMissingResidues does not detect residues missing from structure #300

Closed YoavShamir5 closed 1 month ago

YoavShamir5 commented 1 month ago

I tried running the following on a PDB structure (PDB ID: 5J7S) the clearly has missing residues (e.g. ALA46 and PHE47), but those are not detected:

findMissingResidues fixer = PDBFixer(filename='5j7s.pdb') print(fixer) fixer.findMissingResidues() print("missing: ", fixer.missingResidues)

missing: {}

Is there a syntax issue here?

peastman commented 1 month ago

To identify missing residues, it tries to match the sequence in the SEQRES records to the one in the ATOM records. In the case of that entry, they don't match. If you look at the ATOM records, you'll see there's a huge gap right in the middle of chain A. The residue numbers jump from 303 to 468. There's nothing corresponding in the SEQRES records. Since it can't figure out how to align them, it doesn't know what to add.

YoavShamir5 commented 1 month ago

Got it, thanks for this informative reply.

peastman commented 1 month ago

When you run into cases like this, the workaround is to set missingResidues yourself. You can tell it what you want it to add.

YoavShamir5 commented 1 month ago

Following your advice, I tried the following in order to add missing residues ALA46 and PHE47:

fixer = PDBFixer(filename='5j7s.pdb') fixer.missingResidues = {(0, 20): ['ALA', 'PHE']} fixer.findMissingAtoms() print("missing: ", fixer.missingAtoms)

missing: {}

Why are not atoms annotated as missing? Am I making a syntax mistake here? Thanks for your help.

peastman commented 1 month ago

findMissingAtoms() only records atoms that are missing from existing residues. Missing residues are tracked separately in missingResidues. They both get added when you call addMissingAtoms().

https://github.com/openmm/pdbfixer/blob/f9aae0bcfd7b95661cb524c5e52ac71f6b71bf7b/pdbfixer/pdbfixer.py#L1055-L1056

YoavShamir5 commented 1 month ago

Thank you for the info. I tried the following:

fixer = PDBFixer(filename='5j7s.pdb') fixer.missingResidues = {(0, 19): ['ALA', 'PHE']} fixer.findMissingAtoms() fixer.addMissingAtoms() PDBFile.writeFile(fixer.topology, fixer.positions, open('out_57js.pdb', 'w'))

The output file includes the added ALA and PHE, but they clash and have a pose that does not seem to make sense. Is this the result of the residue completion algorithm, or is my syntax flawed?

add_residues

YoavShamir5 commented 1 month ago

Also - the output file has the residues renumbered starting from 1. Is there a functionality to avoid this, keeping the original numbering? Thanks again

peastman commented 1 month ago

When adding missing residues it does its best to add them in places that don't clash with anything, but sometimes it can be challenging. Try running it a few times and see if it does better.

When you call writeFile() you can add the argument keepIds=True to tell it to preserve all existing IDs rather than generating new ones.

YoavShamir5 commented 1 month ago

Thanks a lot for the info. I tried this again with a different structure (5A46), where I want to model just 3 of many more missing reisdues:

from pdbfixer import PDBFixer from openmm.app import PDBFile

fixer = PDBFixer(filename='5a46.pdb') fixer.missingResidues = {(0, 174): ['ILE', 'HIS', 'HIS']} fixer.findMissingAtoms() fixer.addMissingAtoms() PDBFile.writeFile(fixer.topology, fixer.positions, open('out_5a46.pdb', 'w'),keepIds=True)

My goal is just to add these three residues, and not any other atoms. Once again the structure at the output clashes all over the place (to the left - the original section with missing residues, to the right - the added residues): 5a46

Maybe you software tool is not the solution for my use case, but I am not sure if that's the case. Thanks anyway!

peastman commented 1 month ago

I think some of the problem is coming from the waters. The need to avoid them is constraining where it can put the new residues. If I add the line

fixer.removeHeterogens(keepWater=False)

I get a somewhere better result, though it's still not perfect. This might be a case where hand editing is needed. It's a cramped space to fit three residues into.