Closed YoavShamir5 closed 1 month ago
To identify missing residues, it tries to match the sequence in the SEQRES records to the one in the ATOM records. In the case of that entry, they don't match. If you look at the ATOM records, you'll see there's a huge gap right in the middle of chain A. The residue numbers jump from 303 to 468. There's nothing corresponding in the SEQRES records. Since it can't figure out how to align them, it doesn't know what to add.
Got it, thanks for this informative reply.
When you run into cases like this, the workaround is to set missingResidues
yourself. You can tell it what you want it to add.
Following your advice, I tried the following in order to add missing residues ALA46 and PHE47:
fixer = PDBFixer(filename='5j7s.pdb') fixer.missingResidues = {(0, 20): ['ALA', 'PHE']} fixer.findMissingAtoms() print("missing: ", fixer.missingAtoms)
missing: {}
Why are not atoms annotated as missing? Am I making a syntax mistake here? Thanks for your help.
findMissingAtoms()
only records atoms that are missing from existing residues. Missing residues are tracked separately in missingResidues
. They both get added when you call addMissingAtoms()
.
Thank you for the info. I tried the following:
fixer = PDBFixer(filename='5j7s.pdb') fixer.missingResidues = {(0, 19): ['ALA', 'PHE']} fixer.findMissingAtoms() fixer.addMissingAtoms() PDBFile.writeFile(fixer.topology, fixer.positions, open('out_57js.pdb', 'w'))
The output file includes the added ALA and PHE, but they clash and have a pose that does not seem to make sense. Is this the result of the residue completion algorithm, or is my syntax flawed?
Also - the output file has the residues renumbered starting from 1. Is there a functionality to avoid this, keeping the original numbering? Thanks again
When adding missing residues it does its best to add them in places that don't clash with anything, but sometimes it can be challenging. Try running it a few times and see if it does better.
When you call writeFile()
you can add the argument keepIds=True
to tell it to preserve all existing IDs rather than generating new ones.
Thanks a lot for the info. I tried this again with a different structure (5A46), where I want to model just 3 of many more missing reisdues:
from pdbfixer import PDBFixer from openmm.app import PDBFile
fixer = PDBFixer(filename='5a46.pdb') fixer.missingResidues = {(0, 174): ['ILE', 'HIS', 'HIS']} fixer.findMissingAtoms() fixer.addMissingAtoms() PDBFile.writeFile(fixer.topology, fixer.positions, open('out_5a46.pdb', 'w'),keepIds=True)
My goal is just to add these three residues, and not any other atoms. Once again the structure at the output clashes all over the place (to the left - the original section with missing residues, to the right - the added residues):
Maybe you software tool is not the solution for my use case, but I am not sure if that's the case. Thanks anyway!
I think some of the problem is coming from the waters. The need to avoid them is constraining where it can put the new residues. If I add the line
fixer.removeHeterogens(keepWater=False)
I get a somewhere better result, though it's still not perfect. This might be a case where hand editing is needed. It's a cramped space to fit three residues into.
I tried running the following on a PDB structure (PDB ID: 5J7S) the clearly has missing residues (e.g. ALA46 and PHE47), but those are not detected:
findMissingResidues fixer = PDBFixer(filename='5j7s.pdb') print(fixer) fixer.findMissingResidues() print("missing: ", fixer.missingResidues)
missing: {}
Is there a syntax issue here?