Open jchodera opened 10 years ago
The full chemical component dictionary can be found at ftp://ftp.wwpdb.org/pub/pdb/data/monomers. (Be prepared for that to bring your web browser to a crawl!) DOP (ftp://ftp.wwpdb.org/pub/pdb/data/monomers/DOP) is dioctylphosphate.
As for how to handle it, I really don't know. I'm not sure there's any universal rule.
Here's what the PDB said:
Begin forwarded message:
From: Rachel Kramer Green <kramer@rcsb.rutgers.edu>
Subject: Re: Data-related : Master index of chemical definitions of residue names? (help-5469)
Date: April 14, 2014 at 8:50:44 AM EDT
To: <info@rcsb.org>, <choderaj@mskcc.org>
Reply-To: info <info@rcsb.org>
Thank you for your email message.
Please take a look at the Chemical Component Dictionary at:
http://www.wwpdb.org/ccd.html
These residues have not been experimentally determined (they do not appear in the coordinates) and thus it is only by the authors statement that they appear in the file, and are thus not defined further within the file.
You may also find the following article on missing residues to be helpful:
http://www.rcsb.org/pdb/101/static101.do?p=education_discussion/Looking-at-Structures/missing.html
Sincerely,
Rachel Green
Rachel Kramer Green, Ph.D.
RCSB PDB
kramer@rcsb.rutgers.edu
New! Deposit X-ray data with the wwPDB at:
http://deposit.wwpdb.org/deposition (NMR and 3DEM coming soon).
___________________________________________________________
Twitter: https://twitter.com/#!/buildmodels
Facebook: http://www.facebook.com/RCSBPDB
As for how to handle it, I really don't know. I'm not sure there's any universal rule.
I think we want a few main modes for pdbfixer
:
ffxml
file is passed so it knows what residue types are availableIn general mode, we can ask it to:
In the forcefield-aware mode, we can ask it to:
For unusual residues or ligands, we can fetch their definitions from the components library (remote or local) and build your minimal forcefield based on that.
One issue that could be challenging to deal with: for each of the standard residues, we have a template giving a reasonable starting structure for it. We won't have that for nonstandard ones. In principle you can work one out from the force field, but it will be nontrivial.
One issue that could be challenging to deal with: for each of the standard residues, we have a template giving a reasonable starting structure for it. We won't have that for nonstandard ones
But the components.cif file (the Chemical Component Dictionary mentioned above) does have a template for a reasonable starting structure for every residue appearing in the PDB.
components.cif
when compressed is 36M, but we would only need the coordinates and a bit of metadata for each residue, so we could conceivably ship a whole copy of this along with pdbfixer
. Alternatively, there may be a way to grab just the residues one needs from the PDB in a just-in-time manner.
There aren't any Python mmCIF reader libraries that I know about, but the data we would need from this file is minimal, and could probably be easily transformed into a usable form.
I ran into the case of 1AO9, which contains a
DOP
residue that is not resolved in theATOM
records. This residue does not appear in a machine-readable form in the file or elsewhere at the RCSB, it seems, though I have emailed the RCSB to confirm this.Many of the
HETATM
ligands appear here in this nicely curated Ligand Expo, where even SDF files can be downloaded, but no such resource appears to exist for protein residues.The only clue in 1AO9 is the
COMPND
header:which states that this
DOP
residue is a "di-(octylphosphate) linker between purine and pyrimidine strands", but this would be immensely difficult for a machine to parse.I guess this means we simply cannot hope to treat these residues in any sensible way. But what should the default behavior be here? Simply omit them, causing a chain break? Make a random substitution?