openmm / pdbfixer

PDBFixer fixes problems in PDB files
Other
461 stars 115 forks source link

Solvent does not surround protein #159

Closed rnowling closed 6 years ago

rnowling commented 6 years ago

I used pdbfixer to add solvent to a model. However, the solvent came adjacent to the protein rather than surrounding it. The solvent also took on a weird shape (not a cube). I've attached the input PDB, resulting PDB, and a screenshot from VMD demonstrating the issue.

gaba_screenshot

gaba_pdbs.zip

peastman commented 6 years ago

Could you provide instructions on how to reproduce this? What exactly did you do to create it?

rnowling-adroll commented 6 years ago

I ran pdbfixer using the web UI. I uploaded the gaba_tm_implicit.pdb file. Went with the defaults until the options for the water box. I added padding of 10 nm on each side. And then downloaded the resulting PDB (gaba_tm_water_box.pdb). The input PDB is a valid structure -- it was already processed to handle missing hydrogrens and replace terminal residues using PDBFixer in a prior run.

If I'm missing any other details, I apologize -- I'm not sure what else you're looking for.

peastman commented 6 years ago

I added padding of 10 nm on each side.

Are you sure about that? That would be a huge box. Based on the box dimensions in the PDB file, it looks to me like you add 2.5 nm on each side (which is still much more than you need).

rnowling-adroll commented 6 years ago

Yep, you're right -- I originally set it to 10 nm, which would be 5 nm on each side. That gave me weird behavior, so I tried 5 nm, which gives 2.5 nm on each side. Either way, that doesn't seem to be related to the core problem -- the water does not surround the protein, it sits adjacent to the protein.

I could totally be doing something wrong with my input parameters. Would you be willing to try running pdbfixer with the given gaba_tm_implicit.pdb and see what you get? Thanks!

peastman commented 6 years ago

I'm already working on it. I just want to be sure I'm doing the same thing you did.

peastman commented 6 years ago

I see what's happening. It's cutting off the output file at 9999 residues, that being the maximum number allowed in a fully compliant PDB file. It's not intentional, though. It's happening as an interaction of several different behaviors. By default, the PDB writer assigns IDs to all residues and atoms, wrapping them back to 1 when they exceed the available number of columns. But PDBFixer specifies keepIds=True to try to preserve IDs from the original file. If some of those are too long, it triggers an assertion that causes it to stop writing. And of course Modeller makes no attempt to limit the length of residue IDs when building the water box, because as far as it knows, you might be planning to write it to a different file format (like PDBx/mmCIF) that doesn't have that limitation.

Arguably PDBFixer should be scanning through all the added waters and fixing their IDs before trying to write to a PDB file. But I think it might be better to just change the behavior of PDBFile.writeModel(). If you specify keepIds=True it will try to preserve your IDs, but if it finds one that's too long to write to a PDB file, it will throw it out and generate a new one.

rnowling commented 6 years ago

Thank you for the help and quick response!

peastman commented 6 years ago

Fixed by https://github.com/pandegroup/openmm/pull/1957.