Closed electronicsbyjulie closed 1 year ago
Can certainly work on this intermittently while solving higher priority issues.
Also see:
The takeaway here is that not only are there multiple possible active configurations per receptor, but also there are configurations active for some G proteins but not others, and these dissimilar conformers bind to different sets of agonists. So it is not sufficient to say that any given receptor is sensitive to an odorant, but rather that a receptor-G-protein pair is sensitive to that odorant, or a receptor can be sensitive to it depending on the G protein.
Originally posted by @primaryodors in https://github.com/primaryodors/primarydock/issues/256#issuecomment-1566694837
In other words, truly accurate predictions will rely on not just an active configuration model of the receptor protein, but one (or ones) made to fit whichever G protein(s) are expressed in the olfactory cells. Presumably this is usually Golf for in vivo olfaction, but some of the empirical measurements may have used other G proteins besides Golf, so this may explain some of the self contradictions in the scientific data.
On the other hand, this makes the job easier when it comes to extrapolating active receptor structures. A typical active receptor model would be modified from the AlphaFold original to 1.) fit the Golf structure or other G protein on the cytoplasmic end; 2.) maintain a salt bridge between EXR2 and the extracellular end of TMR6; 3.) otherwise resemble the conformational changes found in the active OR51E2 cryo-EM model; 4.) minimize internal clashes of the protein at least to the energy level of the AlphaFold original.
Once such a model is generated of a receptor-G-protein pair, it can be serialized as a PDB file, in a subfolder named active
, with a name that identifies both the receptor and the G protein, and then all docks done on this active state conformer would be hard docks for the sake of performance.
Can use the g-protein-based-reshaping
branch for development on this.
Some of the contact points between proteins:
GNAS1 - OR51E2: BW Conserved?
--------------------------------------------------
Q35 - N136 *4.38 Y
A39 - A132/A133 *4.34-5/*3.61-2 N
D215 - R130 *3.59 mostly
E392 - K294 7.56 ORs only, less one orphan; not TAARs.
R385 - E232 6.29 >50%
E392 - K296 7.58 mostly
D215 - R130 3.59 usually R or H
Y358 - S229 5.70/6.26 mostly; let's call this residue 56.50.
Y391 - H131 3.60 most receptors have an H or an R in the 3.59 position; OR51E2 is unusual having HR.
To break the task down into easier pieces:
In the Protein class, create a ftn that accepts a pointer to another Protein.
Next, create a new app, with all the necessary entries in makefile.
Any GPCR lacking BW can still be used if it has the right motifs:
If any motifs are not found and there are no REMARK rows indicating BW numbering, then an error is generated.
G protein alignment shall be accomplished entirely by motif:
Once the contact points have been identified, the GPCR shall be uprighted and the G protein positioned so that the YELL-7.56 contact is at first superimposed. Next, the G protein shall be rotated about the Y axis to bring its Q contact near the *4.38 residue.
The exact positioning of the G protein will then be adjusted to place GPCR:*4.38 in the +Y direction from GNAX:Q(35), with the optimal Y distance for polar contact between the two residues.
If there is a salt bridge for TMR6-EXR2, then the helix shall be rotated in such a way as to facilitate that salt bridge contact. Then the entire TMR6 helix will be rotated outward, by its salt bridge if it has one, else by its 6.48 residue, until it avoids clashes with the G protein. If there is no salt bridge and the +Y end is clashing with the other parts of the protein, then it shall be rotated outward, centering rotation on the -Y end, to avoid clashes.
The output of the new inter-protein function can be used to fine tune the positioning of the G protein and of all TMRs, and to conform the side chains.
At this point, the two proteins can be written to a single combined PDB. Create the ability to specify a chain letter for each protein. The GPCR can be chain A, and the G protein chain B. Current code is that for docking purposes, PrimaryDock will only load chain A. The file should not be saved in the usual pdbs/OR##/ folder, but there should be a dedicated folder under pdbs/ for storing coupled models, also organized under OR##/ folders like the root of pdbs/ has.
It would be optimal to also flex the backbone between TMR helices and rejoin the strands that inevitably get broken. What might be useful for this task would be to use Molecule::conform_molecules()
and include the covalent bonds between adjacent residues in the conformer search. Each motion could be automatically accepted if it results in bettering either the worst or the average of the set of covalent bonds.
We've opted to increase this task to the same priority as the unit tests. The two may be developed concurrently, with equal time given to both.
Were it only that simple, but the G proteins change shape too. The chain from 8F76 can be extracted with the new update to pepteditor and will at least fit a class I OR.
So the GNAL protein from 8iw1 is not even a real human G protein but an engineered chimera.
Wonder if it will still suffice.
Very upset. Best binding paired the aliphatic atoms of d-limonene with OR1A1's Asn109 residue. I've added code to prevent pairing polar with nonpolar like that.
Don't forget #266 is an urgent task.
The protein coupling app has a memory leak that interferes with testing it locally.
Updated the list of inter-protein contacts.
Recommend rewriting the couple app to accept its own format of config file with lists of coupling pairs. Can offer a way to specify multiple matching aminos, e.g. the one in the list that's usually H or R, as well as a positional tolerance, e.g. specify K7.58 but allow for +/-2 so that K7.56 thru K7.60 will also match.
Agreed. Can use a file extension like .cplcfg to distinguish it from a PrimaryDock .config file.
The two proteins can be specified with PROT1 and PROT2 params, overridable on the command line. Then multiple CONTACT params can follow, with a format similar to CONTACT DE215 RH3.59~2
using the tilde to specify positional tolerance. The first argument would apply to protein 1, and the second to protein 2.
If fewer than 3 of the contacts correspond to residues in both proteins, then the app should exit with an error.
There would also want to be params for bridging, homology, and regions for repositioning.
The positions of the EXR and CYT regions can be adjusted instead of moving the TM helices for optimizing the inter-protein contacts. For olfactory GPCRs, the movable regions are CYT1 (between TMR 1 and 2), EXR1 (2-3), CYT2 (3-4), CYT3 (5-6), EXR3 (6-7). EXR and CYT regions that do not contact the G protein or form a salt bridge can then further adjust during soft dock. There are four axes of motion: horizontally outward/inward; horizontally perpendicular to protein XZ center; rotation about Y axis; rotation about radial axis from protein XZ center.
Once the piece has been rotated, the adjoining TM helices are rotated about their other-end residue's CA atom to line up their same-end CA atom with where the piece "expects" it to be. The piece is then moved to align itself with the actual same-end locations, which will probably differ by a fraction of an Angstrom from "expected".
It keeps flinging the TM helices out away from the rest of the protein! 😡
Switch gears and work on the unit tests for a while. It will be easier to solve the positioning bug after taking a break from it and coming back fresh.
It's finally a weekend with nothing going on. It's finally the perfect time to really dig into the code and figure out why it keeps messing up.
If you can find the cause of the problem tonight, then go ahead and take the time to fix it.
Even if limiting the motions to Δy = 0, TMR6 quickly pulls too far away from the rest of the protein.
The answer won't be improving the existing functionality, but rather performing a 3D conformational search that first lines up the contacts' CA atoms at suitable distances, particularly those of segments that don't move, then adjusts each movable segment to maximize contact with its contact partner. No iterations, just a one-time best fit followed by a one-time fine adjustment.
So, technically, I did find the cause of the problem tonight.
I should have been more clear.
Proceed.
There is a hydrophobic pocket in the cryo-EM model formed by I125, I220, V224, L227 of OR51E2 and L388, L393, L394 of GNAS. As these are not direct contacts between individual aminos but rather a general grouping, there ought to be a way to define such a thing as another form of contact point.
To dock ligands in an active conformation protein model would involve:
For ligands below a certain total binding strength, the pre-activation raw PDB dock binding energy would be subtracted.
An accuracy of 80% or better would be just as trustowrthy as the empirical measurements.